Here’s another one of those excellent online services which you should save for future reference. Siteliner will scan 250 pages from your website and check for duplicate content within the site, as well as broken internal links. It will also “identify the pages that are most prominent to search engines as they crawl through your site based on the link patterns between your pages”.
A premium service lets you scan more than 250 pages for 1US¢ a page, which could be well worth the small one-off investment.
You might think you have no duplicate pages, but you’d be surprised how badly-constructed content management systems can generate these by adding on ‘query parameters’ to the page addresses. Often when we ‘crawl’ sites to try to create site maps, we find that these get into infinite loops and we have to stop the process. The same effect will be seen by Google’s crawler, and that’s really not good news. We all have a ‘crawl budget’ from Google, which is the number of pages it’s prepared to look at. If we waste these with duplicate content, we may find that important pages on the site don’t get seen.