There are often many ways of expressing the address of what seems to be the same web page. For example, we may be able to use http or https, www or omit www, add a trailing slash or not, and others. The one that’s selected by the search engines is known as the ‘canonical’ version. For many technical reasons, especially relating to SEO, we should ensure that if someone types in any working version except the ‘canonical’ version, they should be redirected to that version, not shown a duplicate.
As a simple example, the canonical version of the BMON website is:
Type in any of the following and you should see the correct page, and the URL you typed in should correct itself to the canonical version, because you’ve been redirected there:
- https://www.bmon.co.uk/
- https://bmon.co.uk/
- https://bmon.co.uk
- http://www.bmon.co.uk/
- http://www.bmon.co.uk
- http://bmon.co.uk/
- http://bmon.co.uk
On many sites, however, what you end up seeing is a copy of the correct page, and search engines can record this as being a separate page. That means external links can be split up over more than one identical page, and their impact hugely diluted.
Naturally, the search engines try to account for this. I’ve seen a statistic recently that suggests 60% of the web is ‘duplicate content’. This doesn’t mean people copying each others’ sites (although there is a lot of that), but is much more likely down to web sites duplicating their own content. The search engines try to decide which is the canonical, based on various signals, but that might not be the one we want.
We can check what URL Google has chosen as its canonical using the ‘URL Inspection tool’ in Google Search Console. There’s a lot of good information there. But how do we tell Google what we prefer, and consolidate all the signals?
The main method is to properly redirect the alternative URLs. Others include using a canonical tag, being consistent and correct in our sitemap, and ensuring internal links on our site are correct.