The final thing I want to cover in this series about PDF files on your website is making them work in Google. This is really important, because Google is quite happy indexing PDF files, and will give them just as much prominence as web pages if they’re formatted correctly. Few are, but all you need to do is to make sure the content is machine-readable and that the documents have a proper title and a good abstract.
Just like a web page, a PDF file has a “title” tag which can be added in most PDF manipulation applications such as Adobe Acrobat. This is the first thing you should ensure is in place on any important PDF document; you just need to type it in:
You can also add other helpful behind-the-scenes information in this way, but the best way to ensure Google understands your document is to have a good abstract at the start. If this isn’t already part of the document, why not make it part of your cover page, which we discussed yesterday?
Does the filename matter? Well, it will surely help Google if the file is something explanatory, rather than “XCG6-86f.qxd.pdf” or something awful like that. But more importantly, it will help your customer. A filename can easily include the company name, product name and date, so why not be helpful? It’s your document, you can call it (or rename it) whatever you wish.
The other thing to check is that all your PDF documents are machine-readable. There are still a lot of PDF files out there which are no more than image scans, and don’t contain real text. Normally, if you can copy and paste the text from a PDF document, it’s fine. Click on the open document, “select all”, and paste whatever’s copied into a text editor like Notepad. That’s what Google will see. Is it what you want?
Summary
So, here are the points we’ve covered this week:
– Are your PDF documents presented on your website just like any other web page, and not as “downloads”?
– Are your PDF documents pleasant to read on screen, or do they need extensive zooming and scrolling to read?
– If your PDF documents aren’t designed to be read on screen, but to be printed out as a hard-copy record, are they designed for office printers (i.e without huge blocks of colour)?
– Have you got an inventory list of all the PDF documents on your website?
– Have all your PDF documents been made a manageable file size, of no more than a few hundred kB, and perhaps substantially less?
– Do your PDF documents all have file titles and other “meta data”?
– Do your PDF documents have references and links to your company and its website? (Consider adding a cover page for this information if appropriate)
– Are there proper clickable links within your PDF documents?
– Are your PDF documents formatted nicely for Google, including titles, abstracts and fully machine-readable text?
– Have you given the PDF document a descriptive filename?
All of these are things which can be retro-fitted to existing PDF documents and made standard for any new ones being added to your website – it may even be time to write a company procedure for them (including standard filename criteria).
Well, I thought I knew a deal about pdfs, but you live and learn! Interesting little mini-series Chris.