Crawler behaviour in pictures

Posted by Dan Frost on Mon, 03/03/2008 - 09:12

I recently stumbled upon a very interesting research project about how Googlebot, Slurp and MSNbot crawled pages on a very large site (billions of pages).

Googlebot crawling tree

Googlebot activity

Yahoo! slurp crawling tree

Yahoo slurp activity

MSNbot crawling tree

MSNbot activity

Yahoo slurp has a reputation for doing lots of regular crawling but in my experience, you don't see the benefits of that in terms of long tail referals. In my experience, Google has always indexed more pages and produced more long tail referals. It may be that because Google has so much more traffic that the long tail referals are more obvious but I rarely see any long tail stuff from Yahoo so I'm not entirely convinced.

You'll note that Google was crawling fairly steadily and then went bananas around a particular segment - this is because an off-site link was added to a "hub page" in that segment.

I'll expand upon the whole project over the coming days.