Apparently I wasn’t the only one wondering what the Shim Crawler was in my access logs:
157.82.246.104 – - [20/Oct/2005:11:00:16 -0400] “GET /robots.txt HTTP/1.1″ 200 295 “-” “Shim Crawler” “-”
And I also wasn’t the only one that was doing a Google search, finding that the only page that talked about it was at webmasterworld.com, and finding out that webmasterworld was on a several-day outage.
Now that they’re back up, it looks like there wasn’t anything interesting about that bot on the site anyway. But someone else , running into the same webmasterworld hit and outage that I did, already got a lot of information about it directly from the author.
I guess I can rest easy, and move on to the next funny-looking one that I don’t recognize: gsa-crawler+(Enterprise;+GIX-02530;+enterprise-training@google.com)
Again, the webmasterworld hit that’s now readable doesn’t really have any info. Anyone know this one? Maybe I’ll e-mail that address if I get bored. It might be more fun just to come up with entertaining conspiracy theories about what Enterprise Training at Google would be all about.
For now I’ll just assume it’s a well-planned step towards Google’s global domination. Do we all get free gourmet lunches when that happens?
Seems like a strange bot indeed, I’ve never stumbled upon it so far in the logs I have access to. Do you still have its ip address as well? I think if it actually was a genuine Google crawler it should have started its journey from the Google address space, shouldn’t it? In case it’s a fake then at least it’s not that silly as the GoogleBot fake from 209.67.212.18 (h17.plesklogin.net), which totally ignores robots.txt, but that’s a different story
The IP was 65.57.245.11. Googling it turns up that a ton of hits for that IP, but it’s hard to tell what it is. It doesn’t belong to Google’s address space, it’s assigned to Level 3, but I saw a few pages that, if you add them all up, seem to indicate that it most likely belongs to Google’s corporate proxy that their users go through to get to the net. At least, that seems like the most likely explanation right now.
I fired off an e-mail and we’ll see what happens. I didn’t get an immediate bounce, so it’s probably a legitimate address.
I found another IP of the Shim-Crawler bot: 157.82.254.46.
This bot visited me from:
inetnum: 157.82.0.0 – 157.82.255.255
netname: UTNET2
country: JP
descr: University of Tokyo
identified as:
Shim-Crawler(Mozilla-compatible; http://www.logos.ic.i.u-tokyo.ac.jp/crawler/; crawl@logos.ic.i.u-tokyo.ac.jp)