Skip to main content

The World Wide Blogger Culture, And Comment Filter Training

The new comment filtering system has been in place now for almost 6 months, and we are slowly starting to see improvements, with signs that spam (aka "bulk") content is becoming more individualised.

Recently, we've seen suggestions, from discussions in Blogger Help Forum: Something Is Broken, that the filters are being put in place in Asia. Reports of unfair filtering, similar to the complaints seen 6 months ago about English language filtering, are being seen now about Chinese, Japanese, and Thai language filtering.

When the English language filtering was put into place, the filters had to be trained from the bottom up. The issue is slightly different, for Asian language filtering.

New Blogger features generally start out provided in English, and best supported in English. Next comes non English languages that use Roman character sets (West European countries), and finally languages that use non Roman character sets. Asian languages are going to be the hardest to support.

The spam filtering, in Asian languages, is going to present a challenge, based on the filters having already being trained, in English language blogs. Some spammers have been using Asian (Chinese, Japanese, and Thai) characters, to disguise their content, for some time. Many English / European blog owners, working together, have trained the spam filters, already, to see any comments containing non Roman characters as spam.

English language blog owners, (unwillingly, in some cases) helped train the spam filters from the beginning, and had to deal with many false negatives (spam content not detected, in all languages). Later, some blog owners had to deal with false positives (non spam content, falsely labeled as spam, in English).

Asian blog owners, similarly, will have to deal with many false positives (non spam content detected as spam, in Asian languages), as the spam filters will falsely see many legitimate comments, written in Asian languages, as spam. Some false positives will happen because many English / European blog owners, long used to marking all Asian language comments as spam, may continue to do so.

Many people who actively mark spam, when posted in "European" languages other than English, can identify spam by the phrasing and structure - even if they cannot actually read the specific language used. That is because "European" (aka "Romance") languages have a common origin.

Many people who can adequately identify spam, posted in Romance languages, will have no such ability with Chinese, Indian, Japanese, or Thai - and will mistakenly mark all comments, in those languages, as spam. People who publish blogs in Chinese, Indian, Japanese, or Thai will need to be very active, in marking false positives as "Not Spam", frequently and promptly.

>> Top

Comments

Popular posts from this blog

What's The URL Of My Blog?

We see the plea for help, periodicallyI need the URL of my blog, so I can give it to my friends. Help!Who's buried in Grant's Tomb, after all?No Chuck, be polite.OK, OK. The title of this blog is "The Real Blogger Status", and the title of this post is "What's The URL Of My Blog?".

Leave Comments Here

Like any blogger, I appreciate polite comments, when they are relevant to the blog, and posted to the relevant article in the right blog. If you want to ask me a question thats relevant to blogging, but you can't find the right post to start with (I haven't written about everything blogger related, yet, nor the way things are going I don't expect to either), ask your questions here, or leave an entry in my guestbook.

As noted above, please note my commenting policy. If you post a comment to this post, I will probably treat it as a "Contact Me" post. If you have an issue that's relevant to any technical issue in the blog, please leave a comment on the specific post, not here. This post is for general comments, and for non posted contact to me.

If the form below does not work for you, check your third party cookies setting!

For actual technical issues, note that peer support in Blogger Help Forum: Something Is Broken, or Nitecruzr Dot Net - Blogging is, almos…

What Is "ghs.google.com" vs. "ghs.googlehosted.com"?

With Google Domains registered custom domains becoming more normal, we are seeing one odd attention to detail, expressed as confusion in Blogger Help Forum: Learn More About Blogger.My website uses "ghs.google.com" - am I supposed to use "ghs.googlehosted.com", instead?It's good to be attentive to detail, particularly with custom domain publishing. This is one detail that may not require immediate attention, however.