Save linked HTML pages?

tinpanalley

Golden Member
Jul 13, 2011
1,500
22
81
Let's say there's a thread in a forum you want to save that is readable from page to page offline. Can that be done? Until now I've just done Save Page and labelled each one for what page it is. There has to be some kind of stitching tool of sorts.
 

RLGL

Platinum Member
Jan 8, 2013
2,114
321
126
Can be done with Edge browser. Save to reading list. I f you can't do it with with the spring update, the fall update will do it. I am in the preview program and have the latest version.16291
 

tinpanalley

Golden Member
Jul 13, 2011
1,500
22
81
Can be done with Edge browser. Save to reading list. I f you can't do it with with the spring update, the fall update will do it. I am in the preview program and have the latest version.16291
Reading list doesn't save a file hard copy of the entire thread to your hdd.
 

lxskllr

No Lifer
Nov 30, 2004
59,349
9,875
126
Checkout wget, and the various options. It's a cli utility, and it'll probably do it once you get the switches right. I just gave it a quick try, but it started downloading the whole forum, so I cancelled :^D
 

tinpanalley

Golden Member
Jul 13, 2011
1,500
22
81
Checkout wget, and the various options. It's a cli utility, and it'll probably do it once you get the switches right. I just gave it a quick try, but it started downloading the whole forum, so I cancelled :^D
Ugh... that sounds complicated. I've always thought wget was something that required really deep understanding of command line code.
 

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,103
126
There are some forum threads that have hundreds or thousands of posts, how do you save them?

Writing some code yourself probably is the only way to save pages of threads automatically.
 

tinpanalley

Golden Member
Jul 13, 2011
1,500
22
81
Does anyone happen to know if HTTrack can grab just a few specific pages instead of just a WHOLE site?
 

tinpanalley

Golden Member
Jul 13, 2011
1,500
22
81
There are some forum threads that have hundreds or thousands of posts, how do you save them?
Well, that's exactly what I'm asking. But I never said it was threads with thousands of posts in my particular case.
 

lxskllr

No Lifer
Nov 30, 2004
59,349
9,875
126
Ugh... that sounds complicated. I've always thought wget was something that required really deep understanding of command line code.
It's not bad. Once you get a string that works, create a .bat file you call from the command prompt with the thread url as an argument. It'll be as fast as poking a button on a screen, and you'll have the personal satisfaction of doing it yourself.

You could also look into using curl. I don't have personal experience, but it's a little more modern than wget, but I'm not sure that matters.
 

tinpanalley

Golden Member
Jul 13, 2011
1,500
22
81
The problem is that every forum is designed differently.
Like this thread is https://forums.anandtech.com/threads/save-linked-html-pages.2519268/
if there is page2 then it will be https://forums.anandtech.com/threads/save-linked-html-pages.2519268/page-2
other forums software not necessary structured this way.
Right. Good point. Different forum softwares, etc, etc.
Well, right now, because it's usually for only two or three pages, what I do is 'Save As HTML' and that gets me the page and a folder of images and CSS.
 

tinpanalley

Golden Member
Jul 13, 2011
1,500
22
81
It's not bad. Once you get a string that works, create a .bat file you call from the command prompt with the thread url as an argument. It'll be as fast as poking a button on a screen, and you'll have the personal satisfaction of doing it yourself.

You could also look into using curl. I don't have personal experience, but it's a little more modern than wget, but I'm not sure that matters.
Ok, you've piqued my interest. I'll try. HTTrack is still best for getting websites though, right?
 

lxskllr

No Lifer
Nov 30, 2004
59,349
9,875
126
Ok, you've piqued my interest. I'll try. HTTrack is still best for getting websites though, right?
I don't know. I never used it, and haven't used windows in ages. You can grab a whole site with wget, but you need to be careful. You can hammer the site, and fill up your harddrive if it's a site with a lot of content. Some sites will also block automated scraping. I haven't used the tools enough to know ways around something like that. It's a tool I need every so often, but not regularly.

edit:
Here's the man page for wget...

https://www.gnu.org/software/wget/manual/wget.html

the windows version may have some differences, but if so, they probably aren't dramatic.
 

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,103
126
There is an extension for Chrome called SingleFile (and SingleFile core, you need to install both) so you can save a thread web page as a single html file (it will includes all the images that comes with the page).
 

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,103
126
Probably found a solution. This is for xenForo forums only.

Install TamperMonkey extension first for Chrome.

then goto https://openuserjs.org, search xenForo and you will find xenForo - Endless Forum Pages jQuery 2016,

Install it and goto TamperMonkey Dashboard, find the script, click edit button on the right, add https://forums.anandtech.com/* to User Includes section, save it.

Open a new tab and goto any anandtech forum thread, you will see a "Load more posts" button on every page. In this case, every time you click it, 25 more posts will be loaded. Then you use SingleFile extension to save it after you loaded all the pages you want.

You probably have to modify the scripts to work with other types of forum software.
 
Last edited:
  • Like
Reactions: tinpanalley