Skip to main content

Controlling The Search Engine Spiders

Most, but not all, blog owners eagerly anticipate the arrival, upon their blog, by the search engine spiders.

The spiders come to index your blog, when the search engines recognise your blog's existence in the Blogosphere.

For those blog owners who don't want their blogs indexed, Blogger provides the settings, in Settings - Basic - Privacy, "Add your blog to our listings?" (for internal Blogger spiders, and Blogger links) and "Let search engines find your blog?" (for external spiders).

The privacy settings control the content of the "robots.txt" file in the blog.

The spiders, when well behaved, read "robots.txt" for instructions.

The spiders, when interested, read "robots.txt" for instructions. You can modify "robots.txt", using the dashboard "Search preferences" wizard.

You make the privacy settings changes, and let Blogger maintain "robots.txt" on your behalf. Alternately, you can use "Settings" - "Search preferences", if you are daring.

These settings are at the blog level - one setting affects the entire blog. If you would like only a part of the blog protected, make a second blog (blogs are free), and include one blog in the other. If you would like specific URLs protected, use the URL removal tool in Google Webmaster Tools.

Read "Search Console" reports carefully, and learn the meanings.

If you use Google Webmaster Tools / Search Console, maybe to add a sitemap or otherwise analyse or maintain your blogs search engine relationship, you may see some interesting details
URL restricted by robots.txt (http://myblog.blogspot.com/search/label/mylabel)
and you'll generally have one of these notations for each label in the blog.

Those restrictions are normal. All label searches are restricted, so the search engines won't detect the label searches as containing duplicate content. Your blog shouldn't depend upon label searches for your readers to find each post.

Some search engines will index private blogs.

Interestingly, I note that the search engines can have access to blogs that require permission to read. Private blogs can't have blog feeds, but the search engines can still index them. The "robots.txt" file is advisory only; search engines may honour the files directives, or they may ignore the directives.

Comments

Aussie Golfing said…
Thanks for the answer
Hoe kan ik al dat bezoek van volgende blogs blokken? Men kan mijn blog helemaal niet lezen en ik krijg een vals beeld van de echte bezoekers.
Pratik Jain said…
User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Allow: /

Sitemap: http://jprogramming0509.blogspot.com/feeds/posts/default?orderby=UPDATED

This is my robots.txt content.
2 of my URLs are blocked.How to remove it..??
Chuck Croll said…
Pratik,

Your posts are indexed using the main page / post pages URLs.

If posts were also indexed using label searches, you would have the same content being indexed under two different URLs. This would look like duplicated content, to the search engines. Both the indexing using main page / post pages, and using label searches, would be penalised.

Do not remove the "robots.txt" code - that code prevents indexing using label searches - and that is to your benefit.

http://blogging.nitecruzr.net/2008/07/google-webmaster-tools-and-label.html

Popular posts from this blog

Custom Domain Migration - Managing The Traffic

Your blog depends upon traffic for its success.

Anything that affects the traffic to your blog, such as any change in the URL, affects the success of your blog. Publishing the blog to a custom domain, like renaming the blog, will affect traffic to your blog. The effects of the change will vary from blog to blog, because of the different traffic to every different blog.Followers. People who find your blog because of recommendations by other people.Search engines. Robotic processes which methodically surf your blog, and provide dynamic indexing to people who search for information.Subscribers. People who read your content from their newsfeed reader, such as the dashboard Reading List.Viewers. People who read your content from their browser.No two blogs are the same - and no two blogs will have the same combinations of traffic sources.

Stats Components Are Significant, In Their Own Context

One popular Stats related accessory, which displays pageview information to the public, is the "Popular Posts" gadget.

Popular Posts identifies from 1 to 10 of the most popular posts in the blog, by comparing Stats pageview counts. Optional parts of the display of each post are a snippet of text, and an ever popular thumbnail photo.

Like many Stats features, blog owners have found imaginative uses for "Popular Posts" - and overlook the limitations of the gadget. Both the dynamic nature of Stats, and the timing of the various pageview count recalculations, create confusion, when Popular Posts is examined.