Yesterday I tried to explain how PDF documents – especially brochures – shouldn’t just be thrown on a website as if they were a substitute for proper web pages. Nor can you expect readers to be delighted to be confronted with them, as they can be awkward to read on screen and expensive to print out. So before we start to do something a bit better with them, let’s take a look at what PDF documents we’ve got on our websites. Because we do know, don’t we?
If you don’t keep meticulous records, then you need to deconstruct the website and get a list of every PDF document linked-to on the site. You might have the facility to “grab” the whole website via file-copy and filter out the PDF files, or your website host may be able to do it for you. Otherwise you’ll need some sort of web crawling application which can analyse the site and list the PDF files. (Note: If you’re a BMON client, just ask and we’ll do it for you.)
There’s a further way which anyone can use, and that’s to query Google, because Google does index your PDF files. The results may not be comprehensive, as there’s no way of guaranteeing that Google has discovered every PDF file on your website, but it might go a long way towards doing so, and it will almost certainly return the most prominent ones. All you need to do is to use the Google “site:” and “filetype:” commands, like this (obviously replace bmon.co.uk with your own site):
One thing which will come out of this exercise is a good idea of what your PDF documents look like in the Google results. It’s often a bit ugly, but we’ll get on to improving the search engine presentation and performance later in the week.