Skip to main content

The World Wide Blogger Culture, And Comment Filter Training

The new comment filtering system has been in place now for almost 6 months, and we are slowly starting to see improvements, with signs that spam (aka "bulk") content is becoming more individualised.

Recently, we've seen suggestions, from discussions in Blogger Help Forum: Something Is Broken, that the filters are being put in place in Asia. Reports of unfair filtering, similar to the complaints seen 6 months ago about English language filtering, are being seen now about Chinese, Japanese, and Thai language filtering.

When the English language filtering was put into place, the filters had to be trained from the bottom up. The issue is slightly different, for Asian language filtering.

New Blogger features generally start out provided in English, and best supported in English. Next comes non English languages that use Roman character sets (West European countries), and finally languages that use non Roman character sets. Asian languages are going to be the hardest to support.

The spam filtering, in Asian languages, is going to present a challenge, based on the filters having already being trained, in English language blogs. Some spammers have been using Asian (Chinese, Japanese, and Thai) characters, to disguise their content, for some time. Many English / European blog owners, working together, have trained the spam filters, already, to see any comments containing non Roman characters as spam.

English language blog owners, (unwillingly, in some cases) helped train the spam filters from the beginning, and had to deal with many false negatives (spam content not detected, in all languages). Later, some blog owners had to deal with false positives (non spam content, falsely labeled as spam, in English).

Asian blog owners, similarly, will have to deal with many false positives (non spam content detected as spam, in Asian languages), as the spam filters will falsely see many legitimate comments, written in Asian languages, as spam. Some false positives will happen because many English / European blog owners, long used to marking all Asian language comments as spam, may continue to do so.

Many people who actively mark spam, when posted in "European" languages other than English, can identify spam by the phrasing and structure - even if they cannot actually read the specific language used. That is because "European" (aka "Romance") languages have a common origin.

Many people who can adequately identify spam, posted in Romance languages, will have no such ability with Chinese, Indian, Japanese, or Thai - and will mistakenly mark all comments, in those languages, as spam. People who publish blogs in Chinese, Indian, Japanese, or Thai will need to be very active, in marking false positives as "Not Spam", frequently and promptly.

>> Top

Comments

Popular posts from this blog

What's The URL Of My Blog?

We see the plea for help, periodically I need the URL of my blog, so I can give it to my friends. Help! Who's buried in Grant's Tomb, after all? No Chuck, be polite. OK, OK. The title of this blog is "The Real Blogger Status", and the title of this post is "What's The URL Of My Blog?".

Add A Custom Redirect, If You Change A Post URL

When you rename a blog, the most that you can do, to keep the old URL useful, is to setup a stub post , with a clickable link to the new URL. Yo! The blog is now at xxxxxxx.blogspot.com!! Blogger forbids gateway blogs, and similar blog to blog redirections . When you rename a post, you can setup a custom redirect - and automatically redirect your readers to the post, under its new URL. You should take advantage of this option, if you change a post URL.

Adding A Link To Your Blog Post

Occasionally, you see a very odd, cryptic complaint I just added a link in my blog, but the link vanished! No, it wasn't your imagination.