Wednesday, September 21, 2005

When Google Spiders Your Site

This morning Google spidered

neworleansresource-logs]# grep Googlebot access-log - - [21/Sep/2005:04:18:00 -0400] "GET / HTTP/1.1" 200 96388 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +" - - [21/Sep/2005:07:18:10 -0400] "GET /robots.txt HTTP/1.1" 404 216 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +" - - [21/Sep/2005:07:18:11 -0400] "GET / HTTP/1.1" 200 96565 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +" - - [21/Sep/2005:09:03:32 -0400] "GET /sitemap.xml HTTP/1.1" 200 1514 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +"

This appears to be fresh crawl spider and not a deepbot. There is a difference. Fresh crawl spiders almost daily, Deepbot spiders when Google is performing a major update.

If your site hasn't shown up in Google, it may because the "fresh crawl" (which runs each day) was finding your site instead of the main crawl (which runs about once a month). As it does a full crawl of the web, most of the sites are from the fresh crawl and put in the regular Google index. My advice on the fresh crawl is to view it as a nice "bonus" on top of Google's deep index.

What does this mean for you? Don't fret. Here's today's tips:

1. Create a great site. We discussed this (remember, content is king).
2. Submit your site to Google on the "add url" form.
3. Get a link from the Open Directory Project or other directories (Yahoo, etc.). Find a site in that directory and get a link to your new site.
4. Don't panic if your site takes a little while to show up in Google. Be patient, and start using the tips I recommend here about improving your site for users and search engines.

Above, the spider - Googlebot/2.1; + - was looking for (or Getting) - GET / HTTP/1.1. Let's break this down.

GET (requesting, as in "please go get") and the next thing you see is a / Well, that means get me the index page to this site. The HTTP/1.1 is the type of request. (An FTP request would be FTP, etc.) Next, we see the response from the server - a code number: 200. That means - OK, cool!

Next, the bot is looking for robots.txt, and guess what? We don't have one, so the server told the bot - code 404 - Not found. The rest you can figure out. It did spider the sitemap, and this agrees with my account with Google:

Last Downloaded - 4 hours ago.

If you haven't submitted your site to Google yet, now is the time to do that. Just make sure you have your Sitemap in place. If you need help with that, check my post about creating a Google Site Map for your website.

No comments: