Skip to main content

The World Wide Blogger Culture, And Comment Filter Training

The new comment filtering system has been in place now for almost 6 months, and we are slowly starting to see improvements, with signs that spam (aka "bulk") content is becoming more individualised.

Recently, we've seen suggestions, from discussions in Blogger Help Forum: Something Is Broken, that the filters are being put in place in Asia. Reports of unfair filtering, similar to the complaints seen 6 months ago about English language filtering, are being seen now about Chinese, Japanese, and Thai language filtering.

When the English language filtering was put into place, the filters had to be trained from the bottom up. The issue is slightly different, for Asian language filtering.

New Blogger features generally start out provided in English, and best supported in English. Next comes non English languages that use Roman character sets (West European countries), and finally languages that use non Roman character sets. Asian languages are going to be the hardest to support.

The spam filtering, in Asian languages, is going to present a challenge, based on the filters having already being trained, in English language blogs. Some spammers have been using Asian (Chinese, Japanese, and Thai) characters, to disguise their content, for some time. Many English / European blog owners, working together, have trained the spam filters, already, to see any comments containing non Roman characters as spam.

English language blog owners, (unwillingly, in some cases) helped train the spam filters from the beginning, and had to deal with many false negatives (spam content not detected, in all languages). Later, some blog owners had to deal with false positives (non spam content, falsely labeled as spam, in English).

Asian blog owners, similarly, will have to deal with many false positives (non spam content detected as spam, in Asian languages), as the spam filters will falsely see many legitimate comments, written in Asian languages, as spam. Some false positives will happen because many English / European blog owners, long used to marking all Asian language comments as spam, may continue to do so.

Many people who actively mark spam, when posted in "European" languages other than English, can identify spam by the phrasing and structure - even if they cannot actually read the specific language used. That is because "European" (aka "Romance") languages have a common origin.

Many people who can adequately identify spam, posted in Romance languages, will have no such ability with Chinese, Indian, Japanese, or Thai - and will mistakenly mark all comments, in those languages, as spam. People who publish blogs in Chinese, Indian, Japanese, or Thai will need to be very active, in marking false positives as "Not Spam", frequently and promptly.

>> Top

Comments

Popular posts from this blog

Custom Domain Migration - Managing The Traffic

Your blog depends upon traffic for its success.

Anything that affects the traffic to your blog, such as any change in the URL, affects the success of your blog. Publishing the blog to a custom domain, like renaming the blog, will affect traffic to your blog. The effects of the change will vary from blog to blog, because of the different traffic to every different blog.Followers. People who find your blog because of recommendations by other people.Search engines. Robotic processes which methodically surf your blog, and provide dynamic indexing to people who search for information.Subscribers. People who read your content from their newsfeed reader, such as the dashboard Reading List.Viewers. People who read your content from their browser.No two blogs are the same - and no two blogs will have the same combinations of traffic sources.

Stats Components Are Significant, In Their Own Context

One popular Stats related accessory, which displays pageview information to the public, is the "Popular Posts" gadget.

Popular Posts identifies from 1 to 10 of the most popular posts in the blog, by comparing Stats pageview counts. Optional parts of the display of each post are a snippet of text, and an ever popular thumbnail photo.

Like many Stats features, blog owners have found imaginative uses for "Popular Posts" - and overlook the limitations of the gadget. Both the dynamic nature of Stats, and the timing of the various pageview count recalculations, create confusion, when Popular Posts is examined.