Using Webmaster Tools to find what’s not indexed

Do you have Google Webmaster Tools installed, and an XML sitemap? If you don’t, you should probably join our Insider Programme and we can get that sorted out for you. But assuming you’ve got all that in place, you should be able to see how many pages, from the complete list on your site, that you have in the Google index.

If it’s a very large site, there’s a strong possibility that quite a few will be missing. Every site has an “indexation cap” based on factors such as PageRank, “Trust”, server speed and duplicate content issues. If you’ve reached that limit, you need to take action. What you don’t want to do is to wait for Google to get around to finding unindexed important pages. If it hasn’t found them already, they’re probably awkward to find and there’s no reason to assume they’ll appear any time soon. Google needs some encouragement.

A nice technique to set about this task is outlined in Using multiple sitemaps to analyse indexation on large sites on Blogstorm. This suggests breaking the sitemap into multiple files, and examining the results from each to see where the problem might be. As it concludes, “Once you can diagnose exactly the type of pages that Google doesn’t want to index you can fix the issue by improving PageRank flow to those pages and adding more unique content.”

