Search Engine War Blog : « Best search campaign in this years Netimperative Awards. | MSN AdCenter - First Full Month »

How to authenticate Googlebot

Monday, 25 September 2006

Matt from Google has posted the 'official' way to spot if the spider hitting your website is actually Googlebot.

The technique involves an IP lookup to check the crawl host domain contains googlebot.com:

> host 66.249.66.1
1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com.

Followed by a check on that host to verify it matched the IP address:

> host crawl-66-249-66-1.googlebot.com
crawl-66-249-66-1.googlebot.com has address 66.249.66.1

This will be very handy for analytics vendors in correctly filtering fake spiders from their stats, and also website owners in controllling who can crawl their websites. Blog Zero has some great PHP and Perl code snippets if you'd like to see how it's done.

Comments

Post a comment

If you have a TypeKey or TypePad account, please Sign In

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/256505/6165157

Listed below are links to weblogs that reference How to authenticate Googlebot: