N.B. This page is a trap for junkmail robots, so the addresses and data in the bottom section are random and fake :) It's not exactly a really good trap at the moment, really more of a research project for myself and a friend. It's a fun way of experimenting with logging access data and analysing it, and also of finding out a little about how web robots seem to work. We could just read up on the subject, but this seems more fun. We're just seeing how we can manipulate them I suppose, trying ot get them to loop as much as possible, and so on. The script itself might prove very interesting to people who would like to know more about Perl+CGI and some applications more interesting than a contact form.
source code, for those interested, which sort of explains what this page does as well as show how. One point to note is that I keep adding to this script, so there will be gaps in the logging table as the columns are added and previous results dont have them, and the counts might not completely add up (I had one hit before I even introducing logging). This whole thing is just a bit of fun, my experiment to observe robots coming across my pages. .

Robot Log

This is a log of the details of user agents that didn't look like browsers. For those not in the know, the HTTP_USER_AGENT is the program requesting the page, the REMOTE_ADDR/HOST give the IP and hostname of the program, the HTTP_REFERER gives the page the program came from (if it followed the link, and didnt just retrieve the link, then request the page seperately), and the QUERY_STRING/PATH_INFO are the stuff tacked onto the URL by the random links (if the robot loops).

Googlebot/2.1 (+http://www.googlebot.com/bot.html) 90695
Googlebot/2.1 (+http://www.google.com/bot.html) 63091
msnbot/0.3 (+http://search.msn.com/msnbot.htm) 26907
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 23604
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) 12738
Wget/1.9 6609
Mozilla/2.0 (compatible; Ask Jeeves/Teoma) 5820
Mozilla/4.0 (compatible; Getleft 1.1.1) 2669
msnbot/0.11 (+http://search.msn.com/msnbot.htm) 1469
Mozilla/4.0 compatible ZyBorg/1.0 Dead Link Checker ([email protected]; http://www.WISEnutbot.com) 1248
Googlebot/Test (+http://www.googlebot.com/bot.html) 1210
Mozilla/3.0 (compatible) 668
ia_archiver 360
Zao/0.2 (http://www.kototoi.org/zao/) 296
HenryTheMiragoRobot (http://www.miragorobot.com/scripts/mrinfo.asp) 191
Mozilla/4.0 compatible ZyBorg/1.0 ([email protected]; http://www.WISEnutbot.com) 167
LinkWalker 61
Mozilla/4.0 compatible ZyBorg/1.0 Dead Link Checker ([email protected]; http://www.WISEnutbot.com) 55
Wget/1.9.1 54
Wget/1.8.2 33
NaverBot-1.0 (NHN Corp. / +82-2-3011-1954 / [email protected]) 30
Pompos/1.3 http://dir.com/pompos.html 25
Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98) 25
Mozilla/4.0 24
FAST-WebCrawler/3.8 (crawler at trd dot overture dot com; http://www.alltheweb.com/help/webmaster/crawler) 22
jBrowser/J2ME Profile/MIDP-1.0 Configuration/CLDC-1.0 (Google WAP Proxy/1.0) 16
[email protected] 12
larbin_2.6.3 [email protected] 11
IQSearch 11
Mozilla/5.0 (Slurp/cat; [email protected]; http://www.inktomi.com/slurp.html) 9
Tutorial Crawler 1.4 (http://www.tutorgig.com/crawler) 9
Mozilla/4.0 compatible ZyBorg/1.0 ([email protected]; http://www.WISEnutbot.com) 9
TurnitinBot/2.0 http://www.turnitin.com/robot/crawlerinfo.html 9
Iltrovatore-Setaccio/1.2 (It-bot; http://www.iltrovatore.it/bot.html; [email protected]) 9
AnswerBus (http://www.answerbus.com/) 8
WWlib v1.1 8
appie 1.1 (www.walhello.com) 7
Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html) 6
Mozilla/4.77 [en] (X11; U; Linux 2.2.19 i686) 6
Jetbot/1.0 6
findlinks/0.87 (+http://wortschatz.uni-leipzig.de/findlinks/) 6
Gigabot/1.0 6
Ultraseek 5
Seekbot/1.0 (http://www.seekbot.net/bot.html) HTTPFetcher/0.3 5
larbin_2.6.3 [email protected] 5
Gigabot/2.0 5
http://www.almaden.ibm.com/cs/crawler [wf163] 4
ZipppBot/0.11 (ZipppBot; http://www.zippp.net; [email protected]) 4
Mozilla/4.5 [en] (Win98; I) 4
egothor/3.0a (+http://www.xdefine.org/robot.html) 4
Nokia7250/1.0 (3.12) Profile/MIDP-1.0 Configuration/CLDC-1.0 (Google WAP Proxy/1.0) 4
mozDex/0.04-dev (mozDex; http://www.mozdex.com/en/bot.html; [email protected]) 4
Mozilla/4.0 (compatible; WebCapture 3.0; Auto; Windows) 3
Java/1.4.2_05 3
http://www.almaden.ibm.com/cs/crawler [wf85] 3
NutchCVS/0.03-dev (Nutch; http://www.nutch.org/docs/en/bot.html; [email protected]) 3
Calif Univ Tools 3
vspider 3
http://www.almaden.ibm.com/cs/crawler [bc11] 3
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.1) 3
CrawlConvera0.1 [email protected] 3
QuepasaCreep ( [email protected] ) 3
w3search [babylon] 3
http://www.almaden.ibm.com/cs/crawler [wf223] 2
CE-Preload 2
Mozilla/4.0 (compatible; prgrabber 1.0) 2
Java/1.4.1_05 2
http://www.almaden.ibm.com/cs/crawler [wf134] 2
iSiloX/3.35 Windows/32 2
Atomix/ (+http://www.activesecurity.us) 2
Mozilla(IE Compatible) 2
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Larbin/2.6.3 [email protected] 2
Mozilla/4.79 [en] (Windows NT 5.0; U) 2
Mozilla/3.01 (compatible;) 2
Mozilla/3.0 (compatible; WebCapture 2.0; Auto; Windows) 2
Wget/1.8.1 2
NutchOrg/0.03-dev (Nutch; http://www.nutch.org/docs/en/bot.html; [email protected]) 2
Mister Pix II 2.03 1
SE47/1.0.00 UP.Browser/4.1.26l UP.Link/ (Google WAP Proxy/1.0) 1
DA 5.3 1
bvgnclpfagycoxcoeifhbclokgd 1
Java/1.4.1 1
Missigua Locator 1.9 1
http://www.almaden.ibm.com/cs/crawler [iies01] 1
OmniFind [lnx-ir] 1
iSiloX/4.01 Windows/32 1
larbin_2.6.3 [email protected] 1
Mozilla/3.0 (Liberate DTV 1.1) 1
janggxjduawpislikbt5kfgqqe kcdcrhln 1
tdbvvqhtetcideeN2qvmxqqrlrxyqfcllrqjf2 1
hxafwtmvwQbne ix1tkx bq1xaexpliashsnfw 1
- 1
NetAnts/1.25 1
Mozilla/4.0 compatible ZyBorg/1.0 ([email protected]; http://www.WISEnutbot.com) 1
Wget/1.5.3 1
Plucker/Py-1.4 1
Generic 1
Oracle Ultra Search 1
http://www.almaden.ibm.com/cs/crawler [wf224] 1
NG/2.0 1
okjeggntirlqckigboaap 1
sherlock_spider [email protected] 1
t7 qmohmdOfwy Owqfbwrfxysma 1
vbli 8xlvjMw8ggql8dcyrbMqvgww 1
HLoader 1
http://www.almaden.ibm.com/cs/crawler [babylon] 1
Panasonic-X70/A00 (Google WAP Proxy/1.0) 1
HTMLParser/1.5 1
k n dqcphqbamyelbbej 1
omnifind [SanAntonio] 1
Steeler/2.0 (http://www.tkl.iis.u-tokyo.ac.jp/~crawler/) 1
Java/1.4.2_03 1
iGxkyto7sfG p7wgvoiit 1
Gaisbot/3.0+([email protected];+http://gais.cs.ccu.edu.tw/robot.php) 1
larbin_2.6.3 [email protected] 1
Mozilla/6.0 [en] (Win32; I) 1
NextGenSearchBot 1 (for information visit http://www.eliyon.com/NextGenSearchBot) 1
Openbot/3.0+([email protected];+http://www.openfind.com.tw/robot.html) 1
NutchCVS/0.05 (Nutch; http://www.nutch.org/docs/en/bot.html; [email protected]) 1
larbin_2.6.3 [email protected] 1
Java/1.4.1_01 1
idmcheUueaektadhgotnglngrwitkb 1
Mozilla/4.7 [en]C-CCK-MCD NSCPCD47 (Win95; I) 1
Cowbot-0.1.1 (NHN Corp. / +82-2-3011-1954 / [email protected]) 1
oxrfpclhksxbupxabuspooaujji 1
Dillo/0.8.0 1
ELinks/0.9.1 (textmode; NetBSD 1.6.2 sparc; 132x43) 1
jjrmuujuyevrjbigbgbe0uxlbvet 1
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) 1
Mozilla/9 (X11; U; LINUX; en;) 1
TranSGeniKBot http://www.tsgk.net 1
NutchCVS/0.05-dev (Nutch; http://www.nutch.org/docs/en/bot.html; [email protected]) 1
libwww-perl/5.69 1
qc xlgqcdrxkwmpvorl1 1
Mozilla/4.79 [en] (Win98; U) 1
Microsoft Internet Explorer 1
w3search [rostock] 1
Mozilla/3.0 (compatible; Indy Library) 1
Python-urllib/2.0a1, Orase (http://www.orase.com) 1
Mozilla/4.73 [en]C-CCK-MCD (WinNT; U) 1
Program Shareware 1.0.0 1
Mozilla/4.77 [en]C-CCK-MCD (WinNT; U) 1
http://www.almaden.ibm.com/cs/crawler [wp76] 1
NokiaN-GageQD/1.0 SymbianOS/6.1 Series60/1.2 Profile/MIDP-1.0 Configuration/CLDC-1.0 1
JSpider/0.4 1

