Thursday, June 21, 2012

Empty Or New Blogs Are Especially Vulnerable To Spurious Spam Classification

One example of agony, seen all too frequently in Blogger Help Forum: Something Is Broken, comes from new (would be) blog owners.
I setup my new blog, using a great blog name, properly chosen - but before I could post, I got email from Blogger.
Hello, The review of your blog at http://yournewblog.blogspot.com/ confirmed that the content violates the Terms of Service for: SPAM. In accordance with those conditions, the blog has been deleted - and its URL is no longer accessible.
This was a new blog, with no posts! How can it be considered spam??
Unfortunately, this blog owner is the latest victim of the spam problem, and "friendly fire".

If we consider the known lifetime of spam blogs, we can see that new non spam blogs are especially vulnerable to spurious classification as spam hosts.

A spam blog, as methodically published by a spammer, goes through 5 stages of existence, in its lifetime.
  • Empty (just published).
  • Reserve (previously empty, republished with scraped content added).
  • Active (previously reserve, republished with payload added).
  • Detected (previously active, locked by the Blogger Spam Classification bot).
  • Deleted.
Unfortunately, an Empty spam blog is remarkably similar to an Empty non spam blog (even one with a great, properly chosen name) - depending upon various other details. This leaves some new non spam blogs looking like spam blogs - and subjects them to spurious classification as spam blog.

As the Blogger Spam Classifier examines each new blog, it sees an empty blog. The presence of similar empty blogs, recently examined, may make any new, non spam blog look like another member of the latest spam blog cluster.

It's possible that, the longer a blog remains empty (with no published posts), the more likely it is to be spuriously classified. Personally, as soon as I create a new blog, the first thing that I do is to publish a post with some minimal content. Maybe that's what everybody should be doing.
Welcome to my new blog.
or alternately, the well known Lorem ipsum.

>> Top

4 comments:

Adam said...

Does Blogger evaluate blogs for spam if they are private?

If so thus is another argument for the practice of starting out with restricted readership and only going "live" once you've got some content.

Perhaps Blogger could save everyone a little grief by making new blogs private by default, with instructions on how to go live.

Dom Rafa said...

This applies to "test" blogs as well? I have a semi-empty blog I use for testing things. Last post dates from 2011 or something... But I do have a few things there I wouldn't like to lose...

Morgan Eckstein said...

This information makes me glad that I always do a quick "Who is Morgan" post as soon as I set up a new blog.

Lovely Lavender said...

Good to learn some new stuff from this blog!