• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Rant: Damn site rippers

SarcasticDwarf

Diamond Member
So, once again, some asshat from Australia ripped a good chunk of my site today using HTTrack. They managed to grab about 20-25,000 pages from what I can tell. The stupid thing is, the sites' database is available under a Creative Commons license. So, once again (even though I know it won't do any good), I fired off the usual e-mail to the ISP:

On September 19th, 2004, a user with the IP of 220.236.51.146 (resolved to d220-236-51-146.dsl.nsw.optusnet.com.au) used a program called HTTrack to cause a large database load on the server hosting iblist.com. This is in direct violation of the terms of use at http://www.iblist.com/other/terms.php

A sample of the raw server log is below.

220.236.51.146 - - [19/Sep/2004:19:53:24 -0500] "GET /book13843.htm HTTP/1.1" 200 8459 "http://www.iblist.com/author2124.htm" "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"
220.236.51.146 - - [19/Sep/2004:19:53:25 -0500] "GET /book14046.htm HTTP/1.1" 200 8448 "http://www.iblist.com/author2124.htm" "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"
220.236.51.146 - - [19/Sep/2004:19:53:25 -0500] "GET /book13683.htm HTTP/1.1" 200 8466 "http://www.iblist.com/author2124.htm" "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"
220.236.51.146 - - [19/Sep/2004:19:53:25 -0500] "GET /book13790.htm HTTP/1.1" 200 8483 "http://www.iblist.com/author2124.htm" "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"
220.236.51.146 - - [19/Sep/2004:19:53:26 -0500] "GET /book13809.htm HTTP/1.1" 200 8438 "http://www.iblist.com/author2124.htm" "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"
220.236.51.146 - - [19/Sep/2004:19:53:26 -0500] "GET /book13846.htm HTTP/1.1" 200 8594 "http://www.iblist.com/author2124.htm" "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"
220.236.51.146 - - [19/Sep/2004:19:53:26 -0500] "GET /book13895.htm HTTP/1.1" 200 8419 "http://www.iblist.com/author2124.htm" "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"

Please respond within the business week with the corrective action taken.

xxxxxxxxxxx





end rant
 
Here's the response you'll get:

Dear SarcasticDwarf,

We really don't give a sh!t

yours truly,

Asshat's ISP

😀
 
Originally posted by: KLin
Here's the response you'll get:

Dear SarcasticDwarf,

We really don't give a sh!t

yours truly,

Asshat's ISP

😀

Actually, i'd be suprised if I got that back. This is an Australian ISP remember.
 
Well, if you know it's being done with that program, and it even tells you what user-agent they're using, why not set up a link trap? Return pages that have bogus information on them and links to other bogus pages. They'll keep trying to retrieve crappy pages forever. Eventually, they might get the hint.
 
Originally posted by: SarcasticDwarf
Originally posted by: KLin
Here's the response you'll get:

Dear SarcasticDwarf,

We really don't give a sh!t

yours truly,

Asshat's ISP

😀

Actually, i'd be suprised if I got that back. This is an Australian ISP remember.

So what does that mean for those of us not familar with Australian ISPs?
 
Originally posted by: Gurck
That sucks... Could you implement code to recognize when someone's doing this and block them?

Yes, I can and I have. The problem is, there are dozens of programs that do this. All I can do is ban them as I come across them.
 
Originally posted by: SirPsycho
Well, if you know it's being done with that program, and it even tells you what user-agent they're using, why not set up a link trap? Return pages that have bogus information on them and links to other bogus pages. They'll keep trying to retrieve crappy pages forever. Eventually, they might get the hint.

I would set up something like that if I knew how.
 
Originally posted by: PhasmatisNox
Originally posted by: SarcasticDwarf
Originally posted by: KLin
Here's the response you'll get:

Dear SarcasticDwarf,

We really don't give a sh!t

yours truly,

Asshat's ISP

😀

Actually, i'd be suprised if I got that back. This is an Australian ISP remember.

So what does that mean for those of us not familar with Australian ISPs?

Support about as bad as Dell's and an extremely restrictive FCC (well, whatever their equivalent is).
 
Heh, good luck getting any kind of response. If you do it'll probably be a we'll look into it, don't call us, we'll call you type deal. 🙁
 
Originally posted by: amdfanboy
Originally posted by: n0cmonkey
wget > *

I beg to differ. Spiderzilla + Mozilla == Heaven 🙂

Mozilla by itself is ok, but wget just does me right. Webgrabber or something for Mac OS X is pretty good too. I think I can spoof user-agents in that one. :evil:
 
Originally posted by: n0cmonkey
Originally posted by: amdfanboy
Originally posted by: n0cmonkey
wget > *

I beg to differ. Spiderzilla + Mozilla == Heaven 🙂

Mozilla by itself is ok, but wget just does me right. Webgrabber or something for Mac OS X is pretty good too. I think I can spoof user-agents in that one. :evil:

I really should change it when I take a lot.
 
Back
Top