I got an email from someone who had tracked down one of my little bots, he was a little upset and after a few emails I began to see his point he had just been crawled by TimSpyBot who had performed a cloak check and the guy was panicking thinking the thought police were due at any moment. So while I couldn’t save him I could hopefully help others who come across my bots and have renamed them all to comply with a simple standard, I have also made sure where its appropriate to include a full useragent.
You can read about each TimBot on their dedicated TimBot Page but quickly…
Like the Jetsons but cooler…
General bot actually a global name for several programs now all using same user agent.
Index and crawling bot sometimes called a scraper its just like GoogleBot but smaller and half as bright.
Very simple Bot that returns HTTP status codes, nope that’s it!
Tests for cloaks and is also used to mimic other browsers you shouldn’t really come into contact with TimSpyBot.
StumbleUpon Profiler, a meta crawler which is used in alot of my research sometimes called StumbeGrump it along with TimBot-Crawl would be the most likely bots you will see.
TimBots and REP
TimBots are not big systems crawling hundreds of thousands of sites, they do not gobble up your bandwidth but I am trying to make every effort that most of the bots start to obey some of Robot Exclusion Protocol. As it stands:
- TimBot-Index, Marvin will obey * and timbot-index
- TimBot-Index(timbot-scrape) will only obey timbox-index
- TimSpyBot & TimBot-Crawl ignore all REP
The only robots.txt protocols that any TimBot understands are
The plan is to modify timbot-scrape to follow the other indexing Bots.
If you see one of my bots give it a wave, they are harmless so if one crosses your site you should be able to visit the url in the user agent and find out what it was up to.