UPDATE July 2015: This is a never-ending topic. For the latest ideas, we recommend you read this article on the Moz blog instead.
Although your website analytics application (such as Google Analytics) will probably be very smart at detecting – and ignoring – the constant access by “robots”, some do get through. And they can mess up your statistics by adding a lot of “visits” which aren’t real. Looking through our clients’ Google Analytics accounts in the past few weeks, I’ve seen an epic example surfacing called semalt.com, which appears in the referring sites list for many companies. However, it’s not a site which is sending you real people, so you want it out of your website data. Here’s how to do it.
(If you’re wondering what semalt.com is, apparently it’s some sort of keyword monitoring service, but unlike most robots, it does trigger the Google Analytics script on your page.)
What we’re going to do is to set up a “filter” in Google Analytics. The important thing to remember about filters is that they don’t work retrospectively, so get yours set up as soon as you can. If you need to get rid of semalt.com from your historic data, you’ll need to create a segment and exclude semalt.com from traffic sources.
Here goes then. Click “Admin” at the top of the page in Google Analytics, then select the “view” you’re using in the third column. Click “Filters” and “+ New Filter”, then set up the filter as shown below. Click “Save” and you’re done!
UPDATE: It’s generally agreed that although selecting “Referral” as shown above will work, selecting “Campaign Source” is better.
(Yes, that is a backslash in “Filter Pattern”)
The same technique can be used to exclude traffic from other undesirable sites. You may have more of these than you think: one good way to find them is to look at Audience > Technology > Network and then look for sources with a significant number of visits but “100%” in the bounce column.