Sunday, December 14, 2008

A Blog Recently Made Private Won't Be So, Immediately

Occasionally, we see the plaintive query
I just changed my blog so only my friends should be able to read the posts. My visitor log shows unknown visitors though. What is going on - has someone hacked my blog?
and there we see a question from someone who doesn't know about search engine cache.

If you make your blog private, the search engines won't index your newer posts, but what's already indexed will stay in cache. And if someone sees a search page entry, and clicks on it, they'll get (among other things) "View Cached Content", and they'll read the cached posts.

The search engines won't care that you made the blog private. This will be similar to a deleted blog - you can delete, or make the blog private, and what's in cache will stay in cache. And you'll keep getting readers, to the cached posts.

When you make your blog private, the "robots.txt" file is updated. Here's a copy of the file for this blog, "blogging.nitecruzr.net", which is public (you are here - D'ohh).

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Noindex: /feedReaderJson

Sitemap: http://blogging.nitecruzr.net/feeds/posts/default?orderby=updated

And in comparison, here's the file for "private1.nitecruzr.net", which is private. If you don't have access to this blog, you won't see anything.

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Noindex: /feedReaderJson
Disallow: /

There are two differences here.
  1. The sitemap for "blogging.notecruzr.net".
    Sitemap: http://blogging.nitecruzr.net/feeds/posts/default?orderby=updated
    which isn't present in "private1.nitecruzr.net", since private blogs have no feeds, and no sitemaps.

  2. Search engine spiders are forbidden access to private blogs.
    Disallow: /
The "robots.txt" file is voluntary, though, and not all spiders honour it. A disobedient spider could index your blog, and again the posts will end up in cache.

And there's a third issue here. New readers will use the search engines (and possibly "View cached content"), but your established readers will have the blog bookmarked and cached on their computers. Until the cache expires on their computers, they'll see your blog, too. And, they will continue to trigger the visitor meters.

So, if you make your blog private, after it's been well known for a while, don't expect your visitor meters to go immediately to zero. And be aware that a private blog may still be visible to uninvited guests.

>> Top

8 comments:

PC Success said...

That is until they clear their cache, whenever that may be. This would be a good question for google to answer

Chuck said...

Google can maybe answer this question about the Google Search Engine. But the Google Search Engine isn't the only one on the Internet, and all of the search engines feed each other. What's in cache may stay in cache, for a long time.

Hans Husman said...

I dont think that "private" or how they write it in the gui acctually menas that it is private. If you used that funktion. That only means that it isnt indexed, and they do not ping.

If you want to make it private so people that have the adress cant access it takes a little more with Blogger but works ok.

Hans Husman said...

Ok I know see that you thought wrong lika I thought. Probably the solution you tried doesnt even gave the result of it being exkluded with the file or meta-tag.

But check that. You can check the file with Google Webmaster tools and the meta tag in the page source.

I think that is quite old text and noone at Google bothered to check that it worked. The people before I think meent that a new blog probably wouldnt get indexed because they didnt ping and noone would find it.

Taffy said...

So there is NO WAY to actually delete a cached page from Google? I have given them an "urgent removal request" but it's 3 weeks later and the cached page is still available. This is seriously job-threatening for me. i don't know what to do.

Chuck said...

Taffy,

What's in cache will likely stay in cache for a long time, even if Google gets your removal request and acts upon it.

If this seriously job-threatening, you'll need to discuss it with your supervisor. I wouldn't count on any promptness from Google, as a primary solution.

Taffy said...
This comment has been removed by the author.
Chuck said...

Taffy,

You are raising some valid questions. Might I suggest that you take the discussion to my new Google Groups forum - Nitecruzr.Net Blogging, so we can explore these issues?