Skip to content

Saving the orphans

Orphan web pages are those which are freely accessible to anyone with the URL, but which have no links to them. Maybe they’ve just fallen off a menu system; maybe they were uploaded to a site manually and the content management system doesn’t know about them. Who knows?

What we do know is that if the pages aren’t linked to, humans and search engines are going to have trouble finding them. This could be a waste, and it could have negative impacts in the search engines’ assessment of our site. So can we set up an audit for their existence?

The most comprehensive way to search for orphan pages would be to get a list of all the web pages on our server, and compare this to a list of pages which can be found by ‘crawling’ from the home page. This could be done by analysing the log file, showing every page which has been served up, either to real visitors or search engine crawlers. The list might not be comprehensive, but if a page genuinely hasn’t been accessed at all, we probably don’t need to worry about it.

If we can’t get hold of such a list from our IT people, we could look at resources such as our website analytics, Google Search Console, our sitemaps, etc. and get a list of pages on the site from there. It’s fine to get more than one of these lists and combine them.

Then we need to get a ‘crawled’ list of pages on the site. There are tools available online to do this, although only a few are free. Paid-for tools include the well-known Screaming Frog.

Finally, we need to compare the two lists. A quick and dirty technique I’d use would be to paste them in columns in a spreadsheet, then just use the COUNTIF function to highlight how many times each item in one column appears in the other.

Alternatively, Screaming Frog has a complete documented method of identifying orphan pages.