• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

APACHE access logs

IBhacknU

Diamond Member
My question is as follows:

searching my access logs, I find this:

216.35.103.80 - - [24/Oct/2000:07:34:59 -1000] "GET /robots.txt HTTP/1.0" 404 283
216.35.103.80 - - [24/Oct/2000:07:35:21 -1000] "GET / HTTP/1.0" 200 1788
216.35.103.79 - - [24/Oct/2000:07:39:03 -1000] "GET /robots.txt HTTP/1.0" 404 283
216.35.103.79 - - [24/Oct/2000:07:39:35 -1000] "GET / HTTP/1.0" 200 1788
216.35.103.81 - - [24/Oct/2000:07:50:26 -1000] "GET /robots.txt HTTP/1.0" 404 283
216.35.103.81 - - [24/Oct/2000:07:50:47 -1000] "GET / HTTP/1.0" 200 1788

and then this....
213.216.143.39 - - [25/Oct/2000:03:18:30 -1000] "GET /robots.txt HTTP/1.0" 404 283
213.216.143.39 - - [25/Oct/2000:03:18:32 -1000] "GET / HTTP/1.0" 200 1788
213.216.143.37 - - [25/Oct/2000:05:18:04 -1000] "GET /robots.txt HTTP/1.0" 404 283
213.216.143.37 - - [25/Oct/2000:05:18:05 -1000] "GET / HTTP/1.0" 200 1788


Is this a web crawler?
 
so, given this reasoning (that crawlers are are looking for this file), what sort of info might one want to put in this .txt file, and how would it be treated?

example of google.com/robots.txt:

User-agent: *
Disallow: /search
Disallow: /keyword/

 
Although I use a default template with all the websites I've designed, here are some links to documents describing the robots.txt file.

http://www.searchtools.com/robots/robots-txt.html

http://info.webcrawler.com/mak/projects/robots/norobots.html

Here's one of my robots.txt files.

-----8<-----
User-agent: *
Disallow: /cgi-bin/*
Disallow: /reports/*
-----8<-----

It only tells spiders to stay out of the CGI directory and the reports directory (Reports created by AccessWatch)

On one of those pages above there is mention of a Perl script which 'fakes out' the spiders. I have not used a script like that (yet) but have seen how well they work.

Just my $.02 worth.

BTW: To difinitively find out if it was a spider, you can do a reverse DNS lookup on the numbers OR, if that's unsuccessful, do a `whois -h arin.net xxx.xxx.xxx.0` to find out who owns the IPs.
 
Back
Top