Skip to main content

What's In My "robots.txt" File?

When we setup a Blogger blog, we give up control of files and folders, and let Blogger maintain the structure of the blog.

When we publish a Blogger blog (either to BlogSpot, or to a Google Custom Domain), all that we do is post, and maintain the blog template. Even with the files and folders controlled by Blogger, and normally hidden from view, there are ways to examine the contents of some files.

Those who would be better off not examining file content do so anyway, become confused, and stress themselves needlessly. Every week, we read anxious queries
Help! My blog has been hacked!!
or
My robots.txt file is blocking my blog from being indexed!

You can make two Settings changes that are relevant here, but mostly this file is maintained by Blogger code.

Blogger maintains "robots.txt", in each blog, on our behalf.

Occasionally, Blogger makes changes to our blogs in general, and changes the content of "robots.txt" to support the changes made. Recently, changes to "feedReaderJson" necessitated a change to "robots.txt".

Here's the "robots.txt" file for this blog, "blogging.nitecruzr.net", as of 2016/01.

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Allow: /

Sitemap: http://blogging.nitecruzr.net/sitemap.xml


There are 3 main entries, in a standard "robots.txt" file.

Here we see 3 entries. For an explanation of some terms, and for a demonstration illustrating the results, please refer to Google Webmaster Tools - "Analyze robots.txt". You may also be enlightened by reading The Web Robots Pages.
  • This allows access to all blog components of the blog ("Disallow: (null)"), to the spider "Mediapartners-Google", which is the spider that crawls pages to determine AdSense content. This entry overrides the following entry, for the specified spider.

    User-agent: Mediapartners-Google
    Disallow:

  • This disallows indexing from all URLs containing "/search"(ie, label searches) - and it allows all other blog URLs.

    User-agent: *
    Disallow: /search
    Allow: /

  • This defines the URL of the sitemap.

    Sitemap: http://blogging.nitecruzr.net/feeds/posts/default?orderby=updated

    The sitemap is separate from the blog feed, and automatically provided.

You can use Search Console, for an analysis of your "robots.txt" file.

Here's my analysis of "robots.txt" for this blog, run using the wizard. You'll want to view this in full screen mode, zoomed as highly as possible.

Note the hypothetical effects of indexing 3 hypothetical blog URLs, against both "Googlebot" and "Mediapartners-Google", shown at the very bottom.


I've now examined a dozen or so different similar files, for blogs published to both native BlogSpot and custom domains, and excepting the URL of the sitemap, all files have been identical in content. My conclusion is that this is a normal file, and unless we start seeing a flood of complaints about indexing problems, I see no reason to suspect a problem.

So, the next time someone comes to you moaning
My robots.txt file is blocking my blog from being indexed!
you can assure them
No, your "robots.txt" file is normal.
Then, introduce them to Google Webmaster Tools, and its many diagnostic reports.

Comments

Mycharvel said…
Wow thanks dude. I got panic when I saw my robots.txt. Now I know it's just fine :)
Zed Gordon said…
Thank you for this explanation, I was worried too.

When running the wizard all is fine but still my blog does not get indexed or found in search engines. Maybe I am too impatient.

Used to work before though. Let's hope those spiders will come by again sometime, lol.
Mohammed said…
excellent nitecruzr. it helped alot. you are a genius
Autor said…
Well explained. Thank you. Very helpful.
Annie T. Baxter said…
Thank you Nitecruzr, you are becoming my "go to" source for all thing Blogger! It's especially important to be able to find accurate information for our blogs as compared to info. on personal websites, for it's not really apples to apples.

Until next time...Annie
Eurasia Review said…
Actually, I think this post is outdated, as there is a major problem with the robots.txt as it blocks sites for Google News. Our site content is approved for Google News submission, but as we cannot edit the robots.txt it keeps blocking the material from being added in Google News. Nor are we as a custom domain allowed to upload the new format for news sitemaps. Is there a workaroud?
Chuck said…
Hey Eurasia,

Have you used the Webmaster Tools utility, to see what you need? Or, you could drop by Blogger Help Forum: Something Is Broken, and we could go over the details.
Andrew Petersen said…
I think the robots.txt problem that some people are experiencing is because of the infancy of their blog, I just recently started a blog of the small business I'm running, and robots.txt blocked my site from being index 14 times before today. Then today, after about 15 posts, it finally indexed, even though it had been crawled so many times before. I'm not sure how many others have had the same problem, but I definitely share their frustration at the lack of explanation.
http://digitalfruitblog.blogspot.com| Here's my blog, and if anyone else could explain why there was such a delay for my blog being listed I would love to get a comment or and e-mail.
Chuck said…
Hi Andrew,

That sounds like a great discussion item for Blogger Help Forum: Something Is Broken, where a few people might have ideas to mention. Peer support could be very helpful here.
masterymistery said…
I understand the issues in relation to robots.txt.

Problem is the impact on navigation. Prevents the use of max-results = [large number].

So if you have hundreds of posts, and/or hundreds of posts with a particular tag/label, there doesn't seem to be a way to display everything in one batch.

Or is there?
Chuck said…
Hey MM,

Look at my Custom Domains label series. It's segmented, because of Auto Pagination, but you can get all of it, by <using the "Older Posts" links.
Rahul said…
thanks for the this amazing explanations ...

I have one problem ...
my original .blogspot.com blog automatically indexed by google in 2-3 days ..

but when i applied a custom domain to my blog and than added my blog to google webmaster tools , in "Crawl Errors" my home page means whole domain is getting "403 error" ..and my new domain is not getting indexed at all from many weeks ...

how can i solve this .???
Chuck said…
Rahul,

I think you should post this in a new discussion in Blogger Help Forum: Something Is Broken, so we can diagnose your problem.
Well you are wrong, in my instance the robots.txt is blocking search engines from indexing my blog pages. I have webmaster tools for google, bing, yahoo, ect....
When I do a crawl query they all say..."Indexing Restricted by Robots.txt"

Not to be rude, but your wrong!
Ingrid Kitchen said…
I just migrated from a blogger name to my own domain name and your articles are golden!
Freya Renders said…
The robot TXT analysis for my blocked urls shows:User-agent: *
Disallow: /search
Allow: /
Sitemap: http://www.holidaynomad.com/atom.xml?redirect=false&start-index=1&max-results=500

Is the sitemap OK or should I change it again similar to yours?

The analysis also mentions that I have 178 blocked URLs, why do I have such a huge number of blocked URLs?

Popular posts from this blog

Custom Domain Migration - Managing The Traffic

Your blog depends upon traffic for its success.

Anything that affects the traffic to your blog, such as any change in the URL, affects the success of your blog. Publishing the blog to a custom domain, like renaming the blog, will affect traffic to your blog. The effects of the change will vary from blog to blog, because of the different traffic to every different blog.Followers. People who find your blog because of recommendations by other people.Search engines. Robotic processes which methodically surf your blog, and provide dynamic indexing to people who search for information.Subscribers. People who read your content from their newsfeed reader, such as the dashboard Reading List.Viewers. People who read your content from their browser.No two blogs are the same - and no two blogs will have the same combinations of traffic sources.

Stats Components Are Significant, In Their Own Context

One popular Stats related accessory, which displays pageview information to the public, is the "Popular Posts" gadget.

Popular Posts identifies from 1 to 10 of the most popular posts in the blog, by comparing Stats pageview counts. Optional parts of the display of each post are a snippet of text, and an ever popular thumbnail photo.

Like many Stats features, blog owners have found imaginative uses for "Popular Posts" - and overlook the limitations of the gadget. Both the dynamic nature of Stats, and the timing of the various pageview count recalculations, create confusion, when Popular Posts is examined.