• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

grep command issue

SarcasticDwarf

Diamond Member
I am using grep commands to search my raw access logs for visitrs from Google (human, not spiders). The grep command I run is
grep "15\/Feb\/2004.*GET .* HTTP.* 200 [0-9][0-9]*.*www\.google\.com" filename.log | wc -l

The problem is, this has been showing over 1200 the last couple days, and Awstats is only showing 500-800 uniques/day. The raw lines from the log file look like

xx.xxx.xx.xx - - [12/Aug/2004:22:01:51 -0400] "GET / HTTP/1.1" 200 16068 "http://www.google.com/search?hl=en&lr=&ie=UTF-8&safe=off&q=Internet+Book+database" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"

Any ideas as to if it is a problem with the grep command picking up the spiders or an Awstats bug?
 
Originally posted by: cchen
~~~ PSA ~~~ This is NOT the Software - Apps, Programming and Games forum

PSA: This would never go in Software - Apps, Programming and Games as it isn't about specific software. The only other forum it might go in is the OS forum, but this isn't an OS-specific question.
 
Originally posted by: DeathByAnts
Originally posted by: cchen
~~~ PSA ~~~ This is NOT the Software - Apps, Programming and Games forum

PSA: This would never go in Software - Apps, Programming and Games as it isn't about specific software. The only other forum it might go in is the OS forum, but this isn't an OS-specific question.

Pick one
 
Originally posted by: amdfanboy
Originally posted by: DeathByAnts
Originally posted by: cchen
~~~ PSA ~~~ This is NOT the Software - Apps, Programming and Games forum

PSA: This would never go in Software - Apps, Programming and Games as it isn't about specific software. The only other forum it might go in is the OS forum, but this isn't an OS-specific question.

Pick one

I did, any mod is free to pick another.
 
Originally posted by: DeathByAnts
Originally posted by: cchen
~~~ PSA ~~~ This is NOT the Software - Apps, Programming and Games forum

PSA: This would never go in Software - Apps, Programming and Games as it isn't about specific software. The only other forum it might go in is the OS forum, but this isn't an OS-specific question.

Sure it would. How is grep not "specific software"? You could also say this is programming -- shell programming.

Anyways, to actually answer the OP, here are some things that pop up in my mind:

It might not necessarily be www.google.com, it could be just google.com.

Google has more domains than google.com, like google.ca, google.co.uk, etc.

Are you sure the logs aren't being rotated?

And some uh, "tips" that also come to mind 🙂:

Instead of [0-9][0-9]* you can do [0-9]+ (* means zero or more, + means one or more)

instead of grep blah | wc -l, you can do grep -c blah.

I'm pretty sure you don't need to escape forward slashes.

so with all of that in mind, I'd do:

grep -c "15/Feb/2004.*GET .* HTTP.* 200 [0-9]+.*http://([a-zA-Z filename.log
 
Back
Top