APACHE access logs

IBhacknU · Oct 25, 2000

My question is as follows:

searching my access logs, I find this:

216.35.103.80 - - [24/Oct/2000:07:34:59 -1000] "GET /robots.txt HTTP/1.0" 404 283
216.35.103.80 - - [24/Oct/2000:07:35:21 -1000] "GET / HTTP/1.0" 200 1788
216.35.103.79 - - [24/Oct/2000:07:39:03 -1000] "GET /robots.txt HTTP/1.0" 404 283
216.35.103.79 - - [24/Oct/2000:07:39:35 -1000] "GET / HTTP/1.0" 200 1788
216.35.103.81 - - [24/Oct/2000:07:50:26 -1000] "GET /robots.txt HTTP/1.0" 404 283
216.35.103.81 - - [24/Oct/2000:07:50:47 -1000] "GET / HTTP/1.0" 200 1788

and then this....
213.216.143.39 - - [25/Oct/2000:03:18:30 -1000] "GET /robots.txt HTTP/1.0" 404 283
213.216.143.39 - - [25/Oct/2000:03:18:32 -1000] "GET / HTTP/1.0" 200 1788
213.216.143.37 - - [25/Oct/2000:05:18:04 -1000] "GET /robots.txt HTTP/1.0" 404 283
213.216.143.37 - - [25/Oct/2000:05:18:05 -1000] "GET / HTTP/1.0" 200 1788

Is this a web crawler?

cmv · Oct 26, 2000

Most likely. You can do the same thing too though by going to, say, anandtech.com like this:

www.anandtech.com/robots.txt

Judging by the user agent (which can be hacked) it is a web crawler.

EDIT: anandtech.com/robots.txt is a bad example because they don't have a robots.txt. Try this one:

google.com/robots.txt

IBhacknU · Oct 26, 2000

so, given this reasoning (that crawlers are are looking for this file), what sort of info might one want to put in this .txt file, and how would it be treated?

example of google.com/robots.txt:

User-agent: *
Disallow: /search
Disallow: /keyword/

NiTeByTe · Oct 27, 2000

Although I use a default template with all the websites I've designed, here are some links to documents describing the robots.txt file.

http://www.searchtools.com/robots/robots-txt.html

http://info.webcrawler.com/mak/projects/robots/norobots.html

Here's one of my robots.txt files.

-----8<-----
User-agent: *
Disallow: /cgi-bin/*
Disallow: /reports/*
-----8<-----

It only tells spiders to stay out of the CGI directory and the reports directory (Reports created by AccessWatch)

On one of those pages above there is mention of a Perl script which 'fakes out' the spiders. I have not used a script like that (yet) but have seen how well they work.

Just my $.02 worth.

BTW: To difinitively find out if it was a spider, you can do a reverse DNS lookup on the numbers OR, if that's unsuccessful, do a `whois -h arin.net xxx.xxx.xxx.0` to find out who owns the IPs.

IBhacknU · Oct 27, 2000

Thanks NiteByte

Search

APACHE access logs

IBhacknU

Diamond Member

cmv

Diamond Member

IBhacknU

Diamond Member

NiTeByTe

Junior Member

IBhacknU

Diamond Member

TRENDING THREADS