Web pages can be accessed by an infinite number of URLs, because any URL can have a ‘query string’ added to the end of it, which will get visitors to the same place. These two different URLs lead to our home page, for example:
In the search engine world, a ‘canonical’ is the URL that we want to be the definitive one for the page, and it’s important that the search engines understand what it is. If it sees lots of references to the page with query strings or something else like that, and only a few with just the bare URL, it might reasonably consider the URL with the query string to be the chosen one.
This may be important. If two or more URLs point to the same content, and are considered by the search engines not to be the same thing, we have duplicate pages. That may not be a problem in itself, but this will then split the value of the links to the page, which puts us at an SEO disadvantage. Finally, if the search engines see more than one URL representing a page, they will probably crawl both. We only get a certain number of pages ‘crawled’ each time when they take a look at our site, and we’re wasting our ‘crawl budget’.
A canonical tag on a page says to the search engines: “no matter how you got here, this is the actual URL of the page, amalgamate anything else into this URL please”. We can see the canonical tag in the page source, and content management systems will usually generate one automatically. Search engines don’t guarantee to use it (because it can be abused), but it should do the job. It’s worth checking how our sites are set up, either manually or using a crawler which can audit canonicals, like Screaming Frog. BMON clients can ask us to do this for them.