Skip to main content

Referer Spam Cannot Be Blocked, Immediately

That is the unfortunate truth.

Every day, some new member of Blogger Help Forum: Something Is Broken asks, innocently
What is all this traffic from dodgy websites?
and after we explain what the dodgy traffic is, and why it does not reflect real traffic, the next question is
So why doesn't Google block it? Why should I have it polluting my Stats displays, and be unable to find actual traffic in my counts?
and the unfortunate truth is simply that Google cannot block it, because it's not significantly different from normal traffic - and the insignificant difference is not easily detected.

Referer spam cannot be identified, because it is identical in structure to legitimate Stats pageviews - and because its content changes, constantly.

What does a normal blog page "pageview request" look like?

When you click on a link from, say, a post in Blogger Help Forum: Something Is Broken, to this article, your computer sends a single message to the Blogger server, containing three essential details.

  1. The IP address of your computer.
  2. The URL of a forum discussion, which contains a link to this article.
  3. The URL of this article.
From the message, the Blogger server creates a server activity record.
  1. An IP address.
  2. The URL of the page containing a link to the webpage requested.
  3. The URL of the webpage requested.

That server activity record, in the Stats display for the blog, is known as a "pageview".

Finally, the Blogger server starts sending web page content back to your computer, so your computer can display this article to you. As the webpage content is sent back to your computer, your computer receives, and displays, the received content - and asks for more content.

What does a referer spam "pageview request" look like?

Simple enough? So, what is referer spam? Simply, a single message from a spammer computer, to the Blogger computer, containing three essential details.

  1. An IP address - possibly, but not predictably, of their computer.
  2. The URL of the website being pimped (the spammed website).
  3. The URL of the blog being spammed (your blog).
From the message, the Blogger server creates a server activity record.
  1. An IP address.
  2. The URL of the page (supposedly) containing a link to the webpage requested.
  3. The URL of the webpage (supposedly) requested.

That server activity record, in the Stats display for the blog, is also known as a "pageview".

Finally, the Blogger server starts sending web page content back to the IP address provided. If the IP address does refer to the spammers computer, what is received is simply ignored. The spammer computer moves on, and sends another fake pageview message to another server - maybe referencing this blog.

The "pageview request" is generated, before referer spam discontinues.

The problem is simply that no web server can detect a message from a client computer, that results in a response that is just ignored. Web traffic is lossy, and clients drop offline constantly. Even if the response could be detected as ignored, the ignored request might still reflect legitimate activity, initiated by a client that immediately went offline.

There is simply no way for Google to block the spam - because the spam is simply one message that results in a response, by the Blogger server, that is subsequently ignored by the client computer.

That's it.

So why can't Google block the numbers generated by referer spam, as the referer spam hits the servers? Simply because the numbers may not really represent actual spam. They can, just as easily, reflect intense, legitimate activity - or possibly a devious attack against a legitimate website.

Google can only detect referer spam in context, against multiple blogs.

Specific pageview counts and details are observed in context - are blocked only after the same activity is observed against multiple blogs, over long periods of time (similar, in concept, to stateful network traffic analysis) - and the numbers are removed, retroactively.

All of this is a simple unavoidable side effect, of blog owners needing site activity figures that are not affected by script filtering by the blog readers, complicated by fraudulent activity by hackers and spammers.

Referer spam is not unique to Blogger - it is simply tuned to abuse Stats logs.

Please note that referer spam did not start with Blogger - it's an Internet wide problem. Even though it appears mindless and random, some of it is craftily designed and executed.

For a comprehensive look at how referer spam works, outside Blogger, see Wikipedia: Referer spam.

The problem here is threefold.

  1. Too many blog owners obsess over raw pageview counts.
  2. Too many blog owners do not understand the origins of referer spam.
  3. Too many blog owners are not interested in understanding the real problem.

That's it!


Keira said…
Glad to be 1 of several visitants on this awful site : D
bracedmom said…
Is there a way to block another site from hyper linking to your blog? I have had someone on a fetishist site list a link to my blog...creepy. I have temporarily changed the name of my blog and made a stub blog but can't figure out how to keep my followers in the loop, don't want to redirect the creeps too. Any suggestions are greatly appreciated! Old URL new URL
Nancy said…
I'm not so worried about the stats being flawed. I'm getting a pop up window from

the window looks something like this:


Chuck Croll said…

The "AT @ T . ro" hijack is a problem that we've been watching, for a couple months.

It takes a bit of technical skills to diagnose the problem, in each individual blog. If you can post in Blogger Help Forum: Something Is Broken, we can help you with this problem.
Holly Shaw said…
Thank you for the explanation. It puts my mind at ease.
bestmommy said…
Thanks! I've been getting a lot of russian porn sites listed as referring sites to my blog. I had no idea that this could be faked or spam as you call it. I also used to get tons of spam comments with links to shady sites but since I turned on word verification they have stopped. Sure wish I could do the same for the referring sites but as you stated I guess they can't be blocked. At least now I know not to click all those .ru links.
orana velarde said…
Feeling a bit like a moron yes....but thanks for the clearup.
Chuck Croll said…

Referer spam, like "nice blog" spam, is a con job. It took me 4 years to figure out what "nice blog" spam actually is.

Everybody gets conned, eventually. You're not the first person to click on a link, and get an eyeful of something nasty. Let's just hope that what you peeked at only looked nasty.
If the information is contained in the message (i.e. "vampirestat") the messages could be ignored. I'm guessing they just don't want to write code for this or keep track of these sites for something as trivial as statistics.
Chuck Croll said…
Sorry, Mary,

It's just not that simple.

"VampireStat" is not the only referer spam target - and every target of referer spam should not be blacklisted.

Some referer spam targets are legitimate blogs and websites, which have actual readers, and which generate actual referer traffic.

If the Stats referer link is to be useful, it has to be useful to all genuine blogs and websites - including some which are maliciously targeted by referer spam.
In my opinion, the owner of a blog should be able to blacklist whatever site they choose, legitimate or not. The code should exist to enable this. You may claim it is not simple but I don't believe it. I used to write OS code for UNIX (systems programming), also did network and security for major universities. Most things are pretty simple if you know what you were doing.
Chuck Croll said…

In principle, I would agree with you.

However, I suspect that, like Comment Spam Moderation, the Stats Referer Spam detection process is collaborative and heuristic.

Referer Spam Detection will result much better, for everybody, with everybody rowing in synchronisation.
Mur 'AT' Votema said…
Hmmm ... are you sure, google can't stop the spammer? How is it possible then, that these spam traffic sources not shown in google analytics?!

Does google use another algorithms?
Chuck Croll said…

No service "blocks" the spam. Only Stats is vulnerable to the spam - because only Stats extracts pageview counts directly from the Blogger servers.
Fernando Olmos said…
But the answer is simple. Why won't Google just allow blog adminds to manage a *whitelist* proxy? A whitelist is a list of legitimate IPs from legitimate G20 countries and that would get rid of at least 90% of the spam bots out there from countries like Ukraine, Russia, China and India.
Doug Greener said…
Hi Chuck,
Doug Greener here again. Hope you are doing well. My question is: Can referer spam emanate from the U.S.? I have noticed the same regular spikes from the U.S. as I have gotten in the past from Russia. I thought Americans were the good guys.
Chuck Croll said…
Hi Doug,

Thanks for the question.

Referer spam can and does come from the USA - and every other country. Right now, the most visible spam comes from Russia or Eastern Europe - but I've seen it from many 1st world countries too.

Popular posts from this blog

Stats Components Are Significant, In Their Own Context

One popular Stats related accessory, which displays pageview information to the public, is the "Popular Posts" gadget.

Popular Posts identifies from 1 to 10 of the most popular posts in the blog, by comparing Stats pageview counts. Optional parts of the display of each post are a snippet of text, and an ever popular thumbnail photo.

Like many Stats features, blog owners have found imaginative uses for "Popular Posts" - and overlook the limitations of the gadget. Both the dynamic nature of Stats, and the timing of the various pageview count recalculations, create confusion, when Popular Posts is examined.

Help! I Can't See My Blog!

I just posted to my blog, so I know that it's there. I can tell others are looking at it. But I can't see it.

Well, the good news is you don't have a blog hijack or other calamity. Your blog is not gone.

Apparently, some ISPs are blocking *, or maybe have network configuration or infrastructure problems. You can access or you can access, but you can't access, or

You can't access them directly, that is. If you can access any free, anonymous proxy servers, though, you may be able to access your blog.

Note: You can use PKBlogs with the URL pre packaged. Here is the address of this post (with gratuitous line breaks to prevent the old post sidebar alignment problem):

And an additional URL, to provide to those suffering from this problem, would be the WordPress version of this post: