Skip to main content

Controlling The Search Engine Spiders

Most, but not all, blog owners eagerly anticipate the arrival, upon their blog, by the search engine spiders.

The spiders come to index your blog, when the search engines recognise your blog's existence in the Blogosphere.

For those blog owners who don't want their blogs indexed, Blogger provides the settings, in Settings - Basic - Privacy, "Add your blog to our listings?" (for internal Blogger spiders, and Blogger links) and "Let search engines find your blog?" (for external spiders).

The privacy settings control the content of the "robots.txt" file in the blog.

The spiders, when well behaved, read "robots.txt" for instructions.

The spiders, when interested, read "robots.txt" for instructions. You can modify "robots.txt", using the dashboard "Search preferences" wizard.

You make the privacy settings changes, and let Blogger maintain "robots.txt" on your behalf. Alternately, you can use "Settings" - "Search preferences", if you are daring.

These settings are at the blog level - one setting affects the entire blog. If you would like only a part of the blog protected, make a second blog (blogs are free), and include one blog in the other. If you would like specific URLs protected, use the URL removal tool in Google Webmaster Tools.

Read "Search Console" reports carefully, and learn the meanings.

If you use Google Webmaster Tools / Search Console, maybe to add a sitemap or otherwise analyse or maintain your blogs search engine relationship, you may see some interesting details
URL restricted by robots.txt (http://myblog.blogspot.com/search/label/mylabel)
and you'll generally have one of these notations for each label in the blog.

Those restrictions are normal. All label searches are restricted, so the search engines won't detect the label searches as containing duplicate content. Your blog shouldn't depend upon label searches for your readers to find each post.

Some search engines will index private blogs.

Interestingly, I note that the search engines can have access to blogs that require permission to read. Private blogs can't have blog feeds, but the search engines can still index them. The "robots.txt" file is advisory only; search engines may honour the files directives, or they may ignore the directives.

Comments

Aussie Golfing said…
Thanks for the answer
Hoe kan ik al dat bezoek van volgende blogs blokken? Men kan mijn blog helemaal niet lezen en ik krijg een vals beeld van de echte bezoekers.
Pratik Jain said…
User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Allow: /

Sitemap: http://jprogramming0509.blogspot.com/feeds/posts/default?orderby=UPDATED

This is my robots.txt content.
2 of my URLs are blocked.How to remove it..??
Chuck Croll said…
Pratik,

Your posts are indexed using the main page / post pages URLs.

If posts were also indexed using label searches, you would have the same content being indexed under two different URLs. This would look like duplicated content, to the search engines. Both the indexing using main page / post pages, and using label searches, would be penalised.

Do not remove the "robots.txt" code - that code prevents indexing using label searches - and that is to your benefit.

http://blogging.nitecruzr.net/2008/07/google-webmaster-tools-and-label.html

Popular posts from this blog

Stats Components Are Significant, In Their Own Context

One popular Stats related accessory, which displays pageview information to the public, is the "Popular Posts" gadget.

Popular Posts identifies from 1 to 10 of the most popular posts in the blog, by comparing Stats pageview counts. Optional parts of the display of each post are a snippet of text, and an ever popular thumbnail photo.

Like many Stats features, blog owners have found imaginative uses for "Popular Posts" - and overlook the limitations of the gadget. Both the dynamic nature of Stats, and the timing of the various pageview count recalculations, create confusion, when Popular Posts is examined.

Help! I Can't See My Blog!

I just posted to my blog, so I know that it's there. I can tell others are looking at it. But I can't see it.

Well, the good news is you don't have a blog hijack or other calamity. Your blog is not gone.

Apparently, some ISPs are blocking *.blogspot.com, or maybe have network configuration or infrastructure problems. You can access Blogger.com or you can access Blogspot.com, but you can't access nitecruzr.blogspot.com, or bloggerstatusforreal.blogspot.com.

You can't access them directly, that is. If you can access any free, anonymous proxy servers, though, you may be able to access your blog.

Note: You can use PKBlogs with the URL pre packaged. Here is the address of this post (with gratuitous line breaks to prevent the old post sidebar alignment problem):
http://www.pkblogs.com/bloggerstatusforreal.blogspot.com/
2006/07/help-i-cant-see-my-blog.html


And an additional URL, to provide to those suffering from this problem, would be the WordPress version of this post:
ht…