Controlling The Search Engine Spiders

Most, but not all, blog owners eagerly anticipate the arrival, upon their blog, by the search engine spiders.

The spiders come to index your blog, when the search engines recognise your blog's existence in the Blogosphere.

For those blog owners who don't want their blogs indexed, Blogger provides the settings, in Settings - Basic - Privacy, "Add your blog to our listings?" (for internal Blogger spiders, and Blogger links) and "Let search engines find your blog?" (for external spiders).

The privacy settings control the content of the "robots.txt" file in the blog.

The spiders, when well behaved, read "robots.txt" for instructions.

The spiders, when interested, read "robots.txt" for instructions. You can modify "robots.txt", using the dashboard "Search preferences" wizard.

You make the privacy settings changes, and let Blogger maintain "robots.txt" on your behalf. Alternately, you can use "Settings" - "Search preferences", if you are daring.

These settings are at the blog level - one setting affects the entire blog. If you would like only a part of the blog protected, make a second blog (blogs are free), and include one blog in the other. If you would like specific URLs protected, use the URL removal tool in Google Webmaster Tools.

Read "Search Console" reports carefully, and learn the meanings.

If you use Google Webmaster Tools / Search Console, maybe to add a sitemap or otherwise analyse or maintain your blogs search engine relationship, you may see some interesting details
URL restricted by robots.txt (http://myblog.blogspot.com/search/label/mylabel)
and you'll generally have one of these notations for each label in the blog.

Those restrictions are normal. All label searches are restricted, so the search engines won't detect the label searches as containing duplicate content. Your blog shouldn't depend upon label searches for your readers to find each post.

Some search engines will index private blogs.

Interestingly, I note that the search engines can have access to blogs that require permission to read. Private blogs can't have blog feeds, but the search engines can still index them. The "robots.txt" file is advisory only; search engines may honour the files directives, or they may ignore the directives.

Comments

Aussie Golfing said…
Thanks for the answer
Hoe kan ik al dat bezoek van volgende blogs blokken? Men kan mijn blog helemaal niet lezen en ik krijg een vals beeld van de echte bezoekers.
Pratik Jain said…
User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Allow: /

Sitemap: http://jprogramming0509.blogspot.com/feeds/posts/default?orderby=UPDATED

This is my robots.txt content.
2 of my URLs are blocked.How to remove it..??
Chuck Croll said…
Pratik,

Your posts are indexed using the main page / post pages URLs.

If posts were also indexed using label searches, you would have the same content being indexed under two different URLs. This would look like duplicated content, to the search engines. Both the indexing using main page / post pages, and using label searches, would be penalised.

Do not remove the "robots.txt" code - that code prevents indexing using label searches - and that is to your benefit.

http://blogging.nitecruzr.net/2008/07/google-webmaster-tools-and-label.html