PDFs are not duplicate content

Many of us have identical content on our websites in HTML (web) and PDF presentations. There has been an old wives’ tale which says this is a problem for SEO, but time and time again, Google insists that it’s fine. Indeed, the search engine may even show both versions in the results. So if anyone’s recommended that you remove your PDF documents from Google’s crawls just to avoid some sort of SEO duplicate content problem, ignore them.

Of course, PDF documents should be marked up properly for SEO purposes, but that’s a whole different issue. Every one should have a good filename, title, description, and of course should be proper text, not a scanned image. A regular audit of the PDF documents on our websites is always a good idea.