Every day, some new member of Blogger Help Forum: Something Is Broken asks, innocently
What is all this traffic from dodgy websites?and after we explain what the dodgy traffic is, and why it does not reflect real traffic, the next question is
So why doesn't Google block it? Why should I have it polluting my Stats displays, and be unable to find actual traffic in my counts?and the unfortunate truth is simply that Google cannot block it, because it's not significantly different from normal traffic - and the insignificant difference is not easily detected.
Referer spam cannot be identified, because it is identical in structure to legitimate Stats pageviews - and because its content changes, constantly.
What does a normal blog page "pageview request" look like?
When you click on a link from, say, a post in Blogger Help Forum: Something Is Broken, to this article, your computer sends a single message to the Blogger server, containing three essential details.
- The IP address of your computer.
- The URL of a forum discussion, which contains a link to this article.
- The URL of this article.
- An IP address.
- The URL of the page containing a link to the webpage requested.
- The URL of the webpage requested.
That server activity record, in the Stats display for the blog, is known as a "pageview".
Finally, the Blogger server starts sending web page content back to your computer, so your computer can display this article to you. As the webpage content is sent back to your computer, your computer receives, and displays, the received content - and asks for more content.
What does a referer spam "pageview request" look like?
Simple enough? So, what is referer spam? Simply, a single message from a spammer computer, to the Blogger computer, containing three essential details.
- An IP address - possibly, but not predictably, of their computer.
- The URL of the website being pimped (the spammed website).
- The URL of the blog being spammed (your blog).
- An IP address.
- The URL of the page (supposedly) containing a link to the webpage requested.
- The URL of the webpage (supposedly) requested.
That server activity record, in the Stats display for the blog, is also known as a "pageview".
Finally, the Blogger server starts sending web page content back to the IP address provided. If the IP address does refer to the spammers computer, what is received is simply ignored. The spammer computer moves on, and sends another spam message to another server.
The "pageview request" is generated, before referer spam discontinues.
The problem is simply that no web server can detect a message from a client computer, that results in a response that is just ignored. Web traffic is lossy, and clients drop offline constantly. Even if the response could be detected as ignored, the ignored request might still reflect legitimate activity, initiated by a client that immediately went offline.
There is simply no way for Google to block the spam - because the spam is simply one message that results in a response, by the Blogger server, that is subsequently ignored by the client computer.
So why can't Google block the numbers generated by referer spam, as the referer spam hits the servers? Simply because the numbers may not really represent actual spam. They can, just as easily, reflect intense, legitimate activity - or possibly a devious attack against a legitimate website.
Google can only detect referer spam in context, against multiple blogs.
Specific pageview counts and details are observed in context - are blocked only after the same activity is observed against multiple blogs, over long periods of time (similar, in concept, to stateful network traffic analysis) - and the numbers are removed, retroactively.
All of this is a simple unavoidable side effect, of blog owners needing site activity figures that are not affected by script filtering by the blog readers, complicated by fraudulent activity by hackers and spammers.
Referer spam is not unique to Blogger - it is simply tuned to abuse Stats logs.
Please note that referer spam did not start with Blogger - it's an Internet wide problem. Even though it appears mindless and random, some of it is craftily designed and executed.
For a comprehensive look at how referer spam works, outside Blogger, see Wikipedia: Referer spam.
The problem here is threefold.
- Too many blog owners obsess over raw pageview counts.
- Too many blog owners do not understand the origins of referer spam.
- Too many blog owners are not interested in understanding the real problem.