Is there a program for downloading a Whole website

Zebo

Elite Member
Jul 29, 2001
39,398
19
81
I would like to download the contents of a whole website for offline browsing is there a program which can do this?
 

SWScorch

Diamond Member
May 13, 2001
9,520
1
76
Excellent find EagleKepper... I was looking for something exactly like this just the other day.
 

Nocturnal

Lifer
Jan 8, 2002
18,927
0
76
like just the main index.htm or the whole entire site? thats pretty crazy if it can dl the whole entire site including all the files so fourth etc etc.
 

Zebo

Elite Member
Jul 29, 2001
39,398
19
81
HOW? sync?

Looks you can only go 3 deep with that.

Also, teleport is unfortuantly band width limited until you pay the fee. So I'm still looking for an alternative.
 

n0cmonkey

Elite Member
Jun 10, 2001
42,936
1
0
wget
webgrabber

I use wget on most of my UNIX-like systems and webgrabber on the Mac. I never understood offline browsing, unless you are actually downloading pr0n :p
 

Zebo

Elite Member
Jul 29, 2001
39,398
19
81


<< I never understood offline browsing, unless you are actually downloading pr0n >>



I don't have a web connection yet in my office and there are some carving techniques I want to show there. So I plan to download all the mpegs at home put them on video tape and bring them to the shop. This new space I have sucks big time for connectivity sake. Only DSL is offered and it cost more than when I shared a T1 in my previous office.

PrOn. Not a bad idea. I won't see my wife for another 3 weeks.
 

NicColt

Diamond Member
Jul 23, 2000
4,362
0
71
just keep in mind that some sites will automatically ban your ip if you crawl their site.
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
Teleport requires you to specify the starting URL.

You have an option to start download all linked items from the main domain or just the URL that you started from.

For the kids school research and the ability to feed in the output of a search engine, it has been well worth the $30 odd that I paid for it back in 97;
 

DocDoo

Golden Member
Oct 15, 2000
1,188
0
0
This is the reason I left general HTML coding behind. It's too hard to protect for those with "other" ideas for stealing web sites (instead of off-line viewing). I now only code html in PHP and CGI, and leave some traps on the way just for kicks ;)
 

n0cmonkey

Elite Member
Jun 10, 2001
42,936
1
0


<< This is the reason I left general HTML coding behind. It's too hard to protect for those with "other" ideas for stealing web sites (instead of off-line viewing). I now only code html in PHP and CGI, and leave some traps on the way just for kicks ;) >>



Still easy to get everything off your site for offline viewing ;)

Ive run into a bunch of sites like that. Usually in 5min or so I can find a loophole. Usually pisses me off too so I turn off the politeness on my program ;)

EDIT: Oh yeah, if anyone actually put something on their site saying that they dont like people downloading the entire site and are fairly polite about it I will not do it. Ive visited a site that was complaining how everyone keeps doing that, so I refrained from doing it. Pretty simple.
 

DocDoo

Golden Member
Oct 15, 2000
1,188
0
0


<< Still easy to get everything off your site for offline viewing >>


All you will get is the outputted HTML that was created by the CGI or PHP code.


I keep all my CGI off of the HTML root directory and is stored in a folder that is below HTML. This means only the web server can access the CGI directory and no one on the web can get there. Again, outputted CGI/PHP is simply not the same. Besides, even if it was in the HTML path, a simple .HTTPACCESS entry and I can lock a file, folder, IP or entire net-block. Since most web users don't know all that, it is easy to set *cough* decoy scripts. As you might know, with CGI you can program it to do just about anything to the bad guys (bad guys since they go out of their way to "attept" to take private code - or code that was not meant to be shared).

But to be honest, most people stop with general html-page-grabber utilities anyway and what little they get with that is perfectly fine by me :)

FWIW: I am not talking about my "personal" page (I could care less about that), but the ones I get paid for.

function kill() {

echo "boom"

} else {


-:D

 

manly

Lifer
Jan 25, 2000
13,344
4,102
136


<<
EDIT: Oh yeah, if anyone actually put something on their site saying that they dont like people downloading the entire site and are fairly polite about it I will not do it. Ive visited a site that was complaining how everyone keeps doing that, so I refrained from doing it. Pretty simple.
>>



Isn't that what robots.txt is for?

wget honors that convention.

The only slight problem with wget is that it doesn't rewrite any URLs. So it's good for spidering/archiving, but it doesn't really work for offline browsing unless you're willing to fix up the HTML base on *some* documents. It's not a problem if a site's content uses only relative URLs.
 

Zebo

Elite Member
Jul 29, 2001
39,398
19
81


<< This is the reason I left general HTML coding behind. It's too hard to protect for those with "other" ideas for stealing web sites (instead of off-line viewing). I now only code html in PHP and CGI, and leave some traps on the way just for kicks ;) >>



I can't imagine why someone would "steal" a website? Seems silly to publicly show something copyrighted by someone else from an earlier date and call it your own? On the other hand I see nothing wrong with downloading a bunch of mpegs from a site if you plan to watch and your time is limited for click thoughs. Even if it is pron so long as you paid the subcription fee.
 

n0cmonkey

Elite Member
Jun 10, 2001
42,936
1
0
DocDoo Ive never done it for the code, but for offline content browsing. And Ive gotten around plenty of little tricks. Anyhow, its rude, I know, and I havent done it in a long time :)



<<

<<
EDIT: Oh yeah, if anyone actually put something on their site saying that they dont like people downloading the entire site and are fairly polite about it I will not do it. Ive visited a site that was complaining how everyone keeps doing that, so I refrained from doing it. Pretty simple.
>>



Isn't that what robots.txt is for?
>>



Basically.



<< wget honors that convention. >>



There are better programs out there than wget, and in some of them you can tell them to ignore robots.txt ;)