Everybody is so keen on getting the best results in Google but what if you don’t want to be in Google? Here is 5 ways to De-Index pages or sites from Google some are more serious then others so just be careful how you use them.
Robots.txt the best way to screw up a perfectly good website, though it may be competing for that honour with .htaccess files. Google is now following a new and far more imaginative robots directive then the original set proposed in 94 now in the nauties we can simply screw up our site with the line
How cool is that! Sebastian as always has come up with some more creative examples of robot.txt directives which may or may not work.
All joking aside the robot.txt file is still probably the most standards compliant way to get most bots attention and Googlebot and all major search engines at least check this file even if they then go ahead and ignore it. The one “search bot” that ignores robots.txt entirely is Feedfetcher which is used for grabbing RSS feeds and caching them for Google Reader.
The correct way to block Googlebot in robots.txt is
To check that your robots.txt file will do what you expect validate it via the webmaster console
Webmaster console -> Tools ->Analyse robots.txt
I really want to echo Sebastians words and remind even old pro’s validate your robot.txt files and RTFM particularly if your goal is to keep some pages in the index!
2. Meta Tags
Meta Tags how 90s but Google does provide some fun and amusing ways to remove a page via Meta tags. Lets start with the basics of stopping Googlebot from indexing (which is different from caching).
<meta name="robots" content="noindex, nofollow" />
So Google will in theory not index the page or follow links on the page or provide link juice from the page. But if the page had been previously index a cached version may already to exist luckily Uncle Google has provided their own proprietary code isn’t that nice of them!
<meta name="GOOGLEBOT" content="noarchive" />
But maybe I would like the content to be only indexed for a while and then stopped, well again Uncle Google has provided.
<META NAME="GOOGLEBOT" CONTENT="unavailable_after: 25-Nov-2007 12:00:00 GMT">
Gee Uncle Google you so kind to us I have all that flexibility but why do you follow meta tags in the body of HTML don’t you think that’s a bit of security problem?
3. Google Webmaster Console
If Google had their way the centre of every webmasters universe here, you can tell Google which countries you want to target, help it select sitelinks and yes De-List your site, indeed this is according to Google the only full proof method of removing your site from the index forever!
Well forever is a bit long so 6 months after that it will reindex a site unless told otherwise by a Robot.txt file, meta tag or other exotic matter. Still this is perhaps the quickest method of removal from the Index taking between 3-5 days its also the quickest in re-indexing. To get to the tool,
Web master Console -> Tools -> Remove URL
If you wish to Re-Index the age simply visit the same tool and click the Re-Index, just be aware that all that does is send GoogleBot to that domain it will still follow any meta or robot directives.
4. Paid Links FUD
Make a profit and get De-Listed, yes this is a great way to get yourself out of the index, sell a few links, make it obvious use terms like Big Dosh Links here and sure enough Google will never bother you. I don’t need to go into this debate its done to death, this has to be the worst way to get De-Listed at the rate Google going with paid link detection you would be a millionaire before they catch you unless your very lucky!
5. Malware on site
Ok so this one won’t actually get you De-Indexed but it will make you never get a visitor from Google again. Earlier this year Google started to protect its users by preventing access to sites that “may contain” malware this is to stop people getting viruses. When you click a search result which has been flagged as containing malware Google redirects you to a helpful page telling you it won’t help you and if you wish to proceed you will have to manually copy and paste the URL.
Go Google, shame it doesn’t work and causes false positives and misses genuine viruses. Which leads to my only concern with this technique you could be waiting months for Google to notice that nasty virus you uploaded, oh well think of the credit card details you could be collecting, every cloud has a silver lining. If you want to know more you can visit Googles pet project Stop Badware.