How to find out if Google can get the text from your PDF documents

The PDF documents on your website are probably a mish-mash, provided possibly by different creators and almost certainly at different times. There’s no guarantee they were produced with care, using a PDF creator which wrote the content in such a way that it could be easily extracted. Indeed, I’ve seen PDF datasheets which are actually photographic scans of original documents. Ouch. Those can’t be read by Google at all. One neat trick to finding out what Google sees in a PDF document is to import it into “Google Docs”. If you don’t know about Google Docs, think of it as an online version of Microsoft Office and the like. Some people (including yours truly) now use it for almost everything they do. If you’ve got a Google Account (for e.g GMail, Google Analytics etc) then you’ve already got access to Google Docs. Otherwise, it’s time to create a Google Account.

All you then need to do is to upload your PDF file into Google Docs. Before you click “Start Upload”, ensure you’ve ticked “Convert text from PDF or image files to Google Docs documents.” When Google has finished uploading the document, you’ll see each page presented firstly as an image of the page, and then underneath as a version with all the text extracted …as well as Google can extract it. What I then do is to “Select All” the contents of the document, and paste it into a text editor like Notepad. This gives you an uncluttered view of what Google sees in your PDF document. With any luck, it will be a perfect plain copy of the contents. More likely, it’ll be a lot messier. And that’s why the document isn’t taking the Google results by storm, and could do with being re-created, or at least having an abstract page added.

Discussion

  1. Chris Rand Post author

    Yes, you want Google to see the content of your PDF files, because they do get indexed.

  2. Nora Staffen

    Hi Chris
    I didn’t realize that the pdf files I’ve posted weren’t read easily or as I created the document to be. Good to know about Google Docs. I’ll look into that for future postings! Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.