An Important Update

Dear Followers Of This Blog ...

If you did not use a Blogger / Google account when you Followed this blog, years ago, you are probably not Following now . During the past...

Tuesday, January 13, 2015

Don't Backup Your Blogs, By Duplicating Them

THis month, we have several reports in Blogger Help Forum: Get Help with an Issue from owners who make backup blogs.
Why was my blog deleted, by Blogger, as spam?
Some blog owners innocently create multiple blogs, to back up their main blog.

Unfortunately, having multiple blogs with the same content may make the blogs look like a spam blog farm - and Blogger detects and deletes spam blog farms.

I've been advising blog owners, for years, that Blogger blogs need informative, interesting, and unique content, to survive as Blogger blogs.

Scraped blogs - whether you scrape your own, or another person's - are a bad idea.

The general focus on "unique" is intended to discourage scraping. Blogger Help: Spam, phishing, or malware on Blogger discusses spam blogs.
Spam blogs cause various problems, beyond simply wasting a few seconds of your time when you happen to come across one. They can clog up search engines, making it difficult to find real content on the subjects that interest you. They may scrape content from other sites on the web, using other people's writing to make it look as though they have useful information of their own. And if an automated system is creating spam posts at an extremely high rate, it can impact the speed and quality of the service for other, legitimate users.
In the past, Blogger has been focusing on blog owners who steal the work of other blog owners, ie, "scraping" - or who copy by permission, ie, "syndication".

Blogger Content policy defines scraping more explicitly.

Examining Blogger Content Policy, we see the problem expressed more vividly.
Spam: Spam takes several forms in Blogger, all of which can result in deletion of your account or blog. Some examples include creating blogs designed to drive traffic to your site or to move it up in search listings, posting comments on other people's blogs just to promote your site or product, and scraping existing content from other sources for the primary purpose of generating revenue or other personal gains.
Having backup blogs can look like "blogs designed to drive traffic to your site or to move it up in search listings". This is a technique, used by spammers. Why bother to write content? Just use some of the clones, as clones - both to backup each other, and to boost search engine results.

Duplication of content, online, hastens classification as spam.

Backing up a Blogger blog actually duplicates Bloggers efforts. Blogger / Google stores content - ie, your blog - in a cloud of servers, worldwide. If any one server goes out of service, the others are there, to provide immediate backup. You will, likely, never even notice if any one Blogger server goes down.

Alternately, a blog owner might backup a blog because of the spammer publicised unfair blog deletion policy. Here, the blog owner is playing into the spammers hands. If backup blogs were permitted, spammers could publish their spam blog farms with impunity.

Here, what the blog owner wants to plan for by having duplicates, unfair deletion, actually hastens the unfair deletion. This will happen, even if you make the "backup" blog private.

Classification of duplicate content, as abuse, isn't spurious.

If you think about it, a blog classified as spam, because of multiple clones, isn't actually spuriously deleted. If your blog gets a spurious classification, as a possible spam host, you have to appeal the classification. Creating a new blog, with identical content, just makes you look like a non repentant spammer - and increases your vulnerability.

If you want to retain comments and posts, export to an offline file. Just don't spread duplicate content across the Internet.

Dude, hit me with a comment!

Angelina Lenahan said...
This comment has been removed by the author.
Chuck Croll said...

Angel,

That is a good question. To discuss "copying" vs "scraping", IMHO, you need to look at at least 3 details.
1. Intent. Do you intend to do something useful with the copied content - or are you just looking to "bulk up" your blog?
2. Permission. Do you have permission to copy (or is the content public)?
3. Ratio. Does the original content in your blog vastly outweigh the copied content? I use a 90% original to 10% copied ratio, as a starting point. I would like to think that my blog is more like 95% to 5%.

To use the example of "scientists", one famous scientist (Einstein?) said "Every successful scientist stands on the shoulders of every previous scientist.". Science, as well as Theology, blatantly copies content - they just use established rules.

I bet if we do enough Googling, we can find a mutually respectable website which contains "etiquette" or "laws" which discuss this in a more formal manner. Maybe, we may even find references that are used by Google Legal, when they make the final verdict on an accused blog owner.

A question to consider though. Who "owns" the Christian Bible, or the Muslim Koran? Not who has "copyrighted" the various editions - or profits from reprinting something that they, themselves did not write.

Angelina Lenahan said...
This comment has been removed by the author.
Gracey said...

While I understand the spam policy, the issue I have with this "non-backup blog" policy is simple.

When making a complete template or custom design change, the easiest way to to do so without affecting the current blog is to use a secondary, private, non-crawlable blog.

Using that "work" blog, I uploaded the xml file to the private one so I could fiddle with the layout, colours and sizing I wanted to use or change to and this gave me a chance to see how it would affect certain contents already on my blog (and how much work would be involved in changing it), and make sure I had links set up correctly, and nothing was amiss, before actually making these changes to my active blog. It made the transition faster and simpler with less interruption for visitors.

Initially, I kept that "work" blog, but after reading this, I deleted all the posts except one that is a draft.

Not being able to do this sort of thing offline, means a work blog (not meant as a backup, but a place to fiddle with changes or try different coding or scripting) would have been an ideal way.

I guess my question would be ... how long can you use a "work blog" with some duplicated contents (I never published all the posts, but deleted all of those anyways too) before it might be hit with a spam designation?

It took me about 2 weeks to get everything right before initiating & completing the changes to the active blog.

Chuck Croll said...

Gracey,

All very reasonable, except for one detail.

"Private, Non-Crawlable" may not apply to spam classification. I occasionally have blog owners in the forums, reporting a private blog - or one with "robots.txt" or meta tags supposedly blocking crawlers, yet deleted by Blogger for SPAM.

http://blogging.nitecruzr.net/2014/11/static-blogs-and-spam-classification.html

Gracey said...

Yes, I've read some of those which is why I was wondering if there were some sort of "safe" time frame for doing such a thing.

Right now, my work blog is empty save for a single post that's a draft - and it doesn't have any content published on the active blog, so the situation doesn't apply right now.

I'm also a little confused - if your robots file is set not to allow the robots to crawl the blog, google's bots usually respect the robots file, so how would they know if the content is the same or not?

Or, is it based on publishing a whole bunch of posts at one time, whether they're the same or not?

That's what would happen if I backed up any of my blogs using the xml, and then deleted my blog. Opened a new blog with a new url and uploaded the xml file. Would I still have problems in doing that?

If so, it almost makes no sense to use the backup xml file.

Chuck Croll said...

Gracey,

I don't want to mislead anybody - there is no "non-backup blog" policy.

You can have a backup blog, if you want. The issue is that both the original and backup blogs are vulnerable to spam classification, as clones.

If your blogs are classified, you will have to go through the review process, to get them restored.

You won't enjoy the long term effects from being in the review - even if you get your blogs restored - so for everybody's sake, I start out by saying "Don't duplicate your blogs!".

http://blogging.nitecruzr.net/2009/11/attack-of-clones.html

http://blogging.nitecruzr.net/2015/01/spam-review-requires-triage.html

http://blogging.nitecruzr.net/2013/06/hacking-malware-spam-classification-and.html

Chuck Croll said...

Gracey,

My suspicion is that Google maintains two (at least) groups of bots. "GoogleBot" (I'll call it "GoogleContentBot") indexes all "indexable" websites, for Google Search. "GoogleSpamBot" indexes ALL Blogger blogs ("indexable" AND "non-indexable"), for Blogger Spam Classification.

If we were to do a Venn diagram of the GoogleContentBot and GoogleSpamBot targets, we would see very slight overlap.

We SHOULD be able to track both GoogleContentBot and GoogleSpamBot as they index our blogs. If there is a "safe" period for having duplicated content, it would be just after "GoogleSpamBot" makes its recent pass through each blog (but before the next pass).

But this is simply my musings, of a rainy Sunday morn.