Protect your private parts

When a search engine comes round to your website to look for new and changed pages, the first thing it does is look for a file called “robots.txt”. Type it in after your company’s domain name to see if you have one, like this: This small file (although it’s not always small, look at Google’s) tells the search engine “robots” what parts of your website not to “crawl” and by implication, what parts – with any luck – you want it to include in its indexes.

Why would you block a search engine from visiting parts of your site? There are many reasons, such as parts of the site being duplicates of other parts, or sections being internal and private. Perhaps most important is the sad fact that the web is way too big for the search engines, and unless you’re unfeasibly highly-ranked, they’re not going to index everything you have to offer. So it makes sense to section off anything you don’t really care about, and focus on directing the search engines to the pages you want people to visit.

If you can’t think of anything you’d want to direct the search engines away from, then it might not be necessary to have a robots.txt file, although I like to include one anyway, because it just goes against the grain to serve up “page not found” codes, even to search engine robots. You can also give the search engines more detailed requests on a page level – see Using the Robots Meta Tag on the Google Webmaster Central Blog for more information. If you have a web designer on call, it might be worth asking if your robots.txt file is up to date and doing what you want. But also ensure they’ve read Serious Robots.txt Misuse and High Impact Solutions and don’t go blocking off pages which are linked to externally, or make similar mistakes.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.