Today we’ll touch on a subject close to my heart, mainly because I devoted the best part of a year of my life to understanding and overcoming the problems it causes. The subject is duplicate content.

A new post at Google’s own official Webmaster Central blog makes a good attempt at introducing the topic, and pointing you in the direction of further reading. As the title suggests, Duplicate Content Due To Scrapers is primarily concerned with other sites which steal your content and republish it (“scrapers”), rather than the type of “duplicate content” which it’s also worth understanding, namely content repeated on your own site.

You might be surprised to find out that other sites are copying parts of your site, and mystified as to why they’d do it, but the sad truth is that there are millions, or more likely billions, of web pages out there which are just garbage – collections of random text pulled from around the web, some of which is likely to be from your company’s site. Fortunately the search engines are good at identifying this and ignoring it, which is why you’ve probably not come across it. But it’s out there, lurking.

