Need recommendations on program(s) to enable offline web browsing

overturfa

Member
Jun 2, 2002
155
0
0
I've been trying to find a solution to enable true offline web browsing for a small home network since I live in a rural area and my wireless IPS's connection is quite intermittent.

Basically I want to build a server that will copy websites as they are visited and serve them up later in their full form even if my upstream connection has gone offline.

I initially though I could do this using a caching proxy/caching dns server in line with my ISP connection. So I built a caching DHCP/DNS server on Fedora Core and also configured it with Squid. Everything with the server appears to work. DNS resolution happens locally now and new entries get cached as they are browsed. Squid appears to be caching web pages (at least when I tail the proxy log I see everything coming in). But when I remove the upstream connection, DNS resolution stops and clients can't browse to any web pages even though they should already be cached in squid.

I guess I don't understand caching DNS and proxy servers the way I thought I did. Any ideas how I can accomplish what I'm trying to do?

(Open source solutions preferred but not required).
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
I'm not sure why the DNS lookups for cached server names would be failing when the upstream connection is down, if they are really cached.

The problem with the proxy idea is that it will only cache content that has already been sent to the browser, and then only for some period of time. You can't request pages from the proxy's cache that you haven't looked at yet, because it hasn't seen them either.

I've never heard of "read ahead" software for intermittent connections, and I suspect it would be hard to make practical. Usually in situations where a connection is intermittent it is also fairly slow. So there is the question of how much reading ahead the software can do while you're looking at a page. Perhaps if the software was smart enough to realize that you read the first two stories in two certain sections of a newspaper site every day, it could go ahead and grab them as soon as you connect.

Then there is the fact that the web is less and less a request/response medium, and is increasingly interactive and granular in terms of communications between the browser and the server.
 

overturfa

Member
Jun 2, 2002
155
0
0
Yeah I guess I should have worded the proxy part a little better. What I meant to say about that is that once the upstream connection is removed clients can no longer browse pages previously cached in the proxy server.

You make a good point about the granularity of the modern Internet. Not sure how to tackle that one, but read-ahead software does exist. I've been looking at a few different solutions including one called HTTrack. The problem is that most of these type solutions are something that only runs client side. HTTrack has a sister-component called Proxy HTTrack, but from what I can tell it's still in the development stages (although I may try to make something out of that.)

Edit I suppose a clunky way to do what I'm trying to accomplish would be to use this software and download each website into the 'www' directory of Apache and browse to that when I want to look as something off line. It would work, but just not as smooth as if I could get DNS requests to resolve to the cached page.