Need some Googlebot help please!

imported_Phil

Diamond Member
Feb 10, 2001
9,837
0
0
:| 18GB this month ALREADY.

66.249.72.66 - crawl-66-249-72-66.googlebot.com. Is this genuine?

The only large file has been taken down (50MB or so), but on a forum with a 70MB database, we're highly concerned as to why it's leeching so much bandwidth. MSNBot has only taken 200MB. The site itself has used 12GB this month, and Google's added another 18GB to that.

I'm going to add a crawl-delay line into robots.txt, but is there any way we can find out just what the hell it's downloading?

Cheers! :)
 
Aug 16, 2001
22,505
4
81
Go to google and read the info. You can stop crawlers by editing a text file. Don't remember exactly which one and what to put in there put the info will tell you all that.
 

mugs

Lifer
Apr 29, 2003
48,920
46
91
robots.txt

Edit: You already know that. Check logs. :) Or use a logfile analysis tool that doesn't subtract bot hits from your totals to see which files are getting hit the most.
 

hjo3

Diamond Member
May 22, 2003
7,354
4
0
Originally posted by: Phil
I'm going to add a crawl-delay line into robots.txt, but is there any way we can find out just what the hell it's downloading?
Uhhh, check your logs maybe? You already know the bot's IP so it should be easy...
 

imported_Phil

Diamond Member
Feb 10, 2001
9,837
0
0
The only stats package available on the server is awstats, which isn't a huge amount of help.

Can you guys suggest one that is good and will let us see the required info?
 

hjo3

Diamond Member
May 22, 2003
7,354
4
0
Originally posted by: Phil
The only stats package available on the server is awstats, which isn't a huge amount of help.

Can you guys suggest one that is good and will let us see the required info?
You can't access your, like, Apache usage log?
 

imported_Phil

Diamond Member
Feb 10, 2001
9,837
0
0
Originally posted by: hjo3
Originally posted by: Phil
The only stats package available on the server is awstats, which isn't a huge amount of help.

Can you guys suggest one that is good and will let us see the required info?
You can't access your, like, Apache usage log?

342MB. Nope.
 

SagaLore

Elite Member
Dec 18, 2001
24,036
21
81
Phil, please PM me your website and give me a list of top pages accessed by googlebot if you can.
 

imported_Phil

Diamond Member
Feb 10, 2001
9,837
0
0
Originally posted by: SagaLore
Phil, please PM me your website and give me a list of top pages accessed by googlebot if you can.

YGPM, although the website belongs to someone else, I'm helping out with this problem :)
 

imported_Phil

Diamond Member
Feb 10, 2001
9,837
0
0
Okay. By looking through the "Last 300 Visitors" bit in cPanel, I've found that each thread view by Googlebot is taking 300kB, give or take 50kB. Turning on Gzip compression has brought that down to 15kB or so.
I've added rules to the robots.txt file to disallow Google Image Search Bot, and a delay of 10 for Googlebot itself. I've also disallowed GIF and JPG access.

Fingers crossed :)
Thanks for the help so far guys.
 

SarcasticDwarf

Diamond Member
Jun 8, 2001
9,574
2
76
Originally posted by: Phil
Okay. By looking through the "Last 300 Visitors" bit in cPanel, I've found that each thread view by Googlebot is taking 300kB, give or take 50kB. Turning on Gzip compression has brought that down to 15kB or so.
I've added rules to the robots.txt file to disallow Google Image Search Bot, and a delay of 10 for Googlebot itself. I've also disallowed GIF and JPG access.

Fingers crossed :)
Thanks for the help so far guys.

Watch the server load as that may increase it significantly.
 

imported_Phil

Diamond Member
Feb 10, 2001
9,837
0
0
Originally posted by: SarcasticDwarf
Originally posted by: Phil
Okay. By looking through the "Last 300 Visitors" bit in cPanel, I've found that each thread view by Googlebot is taking 300kB, give or take 50kB. Turning on Gzip compression has brought that down to 15kB or so.
I've added rules to the robots.txt file to disallow Google Image Search Bot, and a delay of 10 for Googlebot itself. I've also disallowed GIF and JPG access.

Fingers crossed :)
Thanks for the help so far guys.

Watch the server load as that may increase it significantly.

*nods*
Hopefully it won't be too bad, but pages seem to be loading a hell of a lot faster with a decent-spec PC now.