I Want URL Logging, Complete Web History, Router/Hardware Level if Possible?

CommissarMo

Junior Member
Feb 3, 2010
13
0
0
1. I am very intensive about web history. I visit c.500 websites/day and I like to have a complete, detailed, chronological history of my own web browsing.

2. To wit, I use addons to save full HTML copies of every page I visit.

3. But I would like to also have Network/hardware/router-based URL logging solution which produce a file which has the URL of every single page I visit regardless of browser (I use 6 browsers simultaneously).

4. I am quiet confident this is possible, and read around it a bit, but I'm wondering what the 'best'/'easiest' way to accomplish this might be?

Some Keywords I've picked up reading about it: Tomato, Good Router (I have a terrible ISP issued one), WallWatcher, URLSnarf, DNS-level logging

5. I'm not sure how all these fit in, or fit together. I've been warned that apparently sometimes this information is encrypted before it hits the router.

That said, I would be really quite flabbergasted that in an era where so much is traceable and recordable, I can't get this simple log that I want. Trivially, I could copy and paste the URL of every page I visit on every browser into a text file and have what I want... I just want that process automated.

6. I would like to be able to have a DETAILED URL LOG (so if I visit www.anandtech.com and then visit www.anandtech.com/forum/42 AND then visit www.anandtech.com/forum/43 I would like the URL log of every page.

I know some network logging information just lists the domain, that would NOT meet my objective.

7. While I would like to learn more about this from a hardware/networking perspective, I believe there is software that does what I want as well:

http://www.pcpandora.com/monitored-activities/websites-visited/

8. Finally, I was told to investigate PFSense and perhaps a hardware firewall - would this enable URL logging?

9. I'm sort of frustrated at this point as apparently this is not very easy to do, and I'm sort of shocked by that (as I often am when I find things I consider trivial to be very complex viz. computing).

10. To those who will ask, I just like having a complete record of my web history, and I don't consider browser histories (I also need to collect multiple) reliable for various reasons.

Thanks in advance to anyone willing to offer information!
 

Elixer

Lifer
May 7, 2002
10,376
762
126
Yes, you can use Pfsense and some addons (Squid...) to that to get what you want.

Or, you could ask google / NSA / FSB / and so on for your history... ;)

P.S, 500 sites a day? Do you even sleep?
 

CommissarMo

Junior Member
Feb 3, 2010
13
0
0
Hello Thanks for your reply.

A. Given that I've been repeatedly told on other forums that the URL information is 'encrypted' by the time it reaches the router, and that this would be a problem to generate my own history... how is it that Google/others/etc can so easily obtain it?

i.e. If some 3rd party agent can get my web history... but I'm actually the one visiting the websites and typing in the URLs and I CANNOT easily get my OWN web history... how does this make any sense?

B. Would you be able to point me in the direction/offer some specifics about how I might use PFsense and maybe a hardware firewall to keep records of my own web history?

Would this be complicated, or as simple as... buy a PFSense hardware appliance sold on their site... and click "URL logging" or... more involved?

C. I do a lot of research, reading, on a lot of different topics so I have a large turnover (lots of repeat sites, but different pages of course). And I'm sort of an insomniac as it happens :)
 

Elixer

Lifer
May 7, 2002
10,376
762
126
You title said: "I Want URL Logging, Complete Web History, Router/Hardware Level if Possible?"

The answer to that is yes.

If you encrypt everything before the router gets it, then obviously, you must decrypt it if you expect to see the full url instead of just the domain.
I am assuming this is a VPN connection or something? If so, it still can be done, but, it lots more complicated, and belongs in another site that has to do with security, not here.
Oh, and also, there are many ways to "see" what you are doing, but again, this topic is better suited to other, security related sites.
 

matricks

Member
Nov 19, 2014
194
0
0
That said, I would be really quite flabbergasted that in an era where so much is traceable and recordable, I can't get this simple log that I want. Trivially, I could copy and paste the URL of every page I visit on every browser into a text file and have what I want... I just want that process automated.

You are taking the wrong approach, that is why you find this to be so difficult. Consumer grade routers and firewalls aren't designed to do these kind of things. Since you want the full URL, you need to log at the HTTP level, logging DNS queries won't do. A HTTP proxy is designed to do this: you direct all traffic through the proxy, and configure the proxy to do whatever you like with that traffic - cache it to save bandwidth, block it, log it, redirect it, flip pictures upside down, anything you can imagine.

Privoxy is designed as a privacy tool, but it should be able to do this. You just install it on your computer, direct all your browsers to use 127.0.0.1:8118 as a HTTP proxy, and set debug 1 in your Privoxy configuration (read the manual). Debug level 1 logs all accepted URL requests, which is exactly what you are asking for.

HTTP proxies can run on embedded devices like routers too, if you can install software on it. Privoxy runs on Linux (most routers), and there are other proxies too: Tinyproxy, Squid, Polipo, google for more. This is a fairly basic task that any proxy should be able to do, you just need to read the documentation to find the right logging settings.

A HTTP proxy won't be able to read HTTPS traffic out of the box, since it is encrypted before it leaves the browser. There are ways around this, but they break with good security practices and take some messing around to implement. With browsers pushing preinstalled HSTS lists and pinned certificates you may run into a lot of warnings when using HTTPS sites. Look for info on intercepting SSL/TLS/HTTPS with the software you decide to use.
 
Last edited:

lody2mk

Junior Member
Jun 17, 2015
4
0
0
yes right "Elixer"
If you encrypt everything before the router gets it, then obviously, you must decrypt it if you expect to see the full url instead of just the domain.
 

CommissarMo

Junior Member
Feb 3, 2010
13
0
0
@Matricks - thanks for all that information!

I looked into Privoxy a bit, and started reading about the concept. Of course, as you mentioned HTTPS could prove a problem. I need to think about whether an incomplete log would be helpful or not before investigating further.

This program seems to be quite promising, whether run on a proxy server, or just directly on my machine as its intended. It seems to be a virtual proxy server if I understand correctly? Apparently it builds a local cache version of one's entire web history, which is essentially precisely what I would like (URL logging too but, obviously a fully cached version is actually better for my purposes).

http://www.proxy-offline-browser.com/

This seems to operate along the same lines as the Proxy-in-the-middle type of setup you were referring to?
 

matricks

Member
Nov 19, 2014
194
0
0
I don't know what makes a proxy server virtual, I suppose some might use that to refer to a proxy not running on dedicated hardware. Most proxies would be virtual then, since there really isn't dedicated proxy hardware, it's a secondary feature of servers, firewalls and enterprise routers.

That looks like a proxy more tailored to your use case. As I mentioned, Privoxy is tailored as a privacy tool to filter out unwanted ads, tracking information and so on. Its logging just happened to look like what you needed. This would work for you as well, but I would look up if it will log a simple list of all visited URLs, since I didn't see that from a quick glimpse at the front page.
 

CommissarMo

Junior Member
Feb 3, 2010
13
0
0
A. It's interesting that the Proxy-Offline-Browser is essentially set up like a web proxy would be - I assume it needs to be to make sure it can 'intercept' the web traffic and build the offline browser with it.

B. Last year when I was trying to set this up, I considered using a web crawler of some sort and setting it lose daily on my web history/URL list (which at the time I didn't know how to generate, and the browser histories are SQL lite files which I didn't know how to use with the web crawler, so I gave up); the idea was the crawler would build a complete offline version of my (already visited) 'private web'. It seems like this program does exactly that (and allows one to specify the 'crawled' depth in terms of links, which is really cool I think.

C. ***During research searching I also discovered this HTTP proxy/monitor: Charles.

http://www.charlesproxy.com/overview/

I'm assuming this is just another program like Privoxy already mentioned above. It seems to say it 'supports' SSL in plaintext - what does that mean?


*Also - I'm going to try all of these to see which works best for my needs, but also to post for others. I know it seems esoteric, but I do know a good number of people who like to keep detailed records of their web history.
 

Bock

Senior member
Mar 28, 2013
319
0
0
Sounds like your trying to spy on people/employees. There are routers that do this. They cost quite a bit. Also, some of your employees are smart enough to tether{i.e. cant snoop on that}
 

RadiclDreamer

Diamond Member
Aug 8, 2004
8,622
40
91
Another vote for squid. Its free and does very detailed logging. It is NOT easy to setup if you arent a unix/linux guy. There are tons of other ways to do this but they tend to cost money. Do you have a budget?
 

CommissarMo

Junior Member
Feb 3, 2010
13
0
0
Hello New Posters - thank you so much for the input!

1. I am absolutely NOT trying to 'spy' on people, and I have no employees. I'm actually very opposed to surveillance of any form as a matter of ideological principle. I also just don't care what websites people visit.

2. I actually just want detailed logs of my browsing history. It helps to understand how I (perhaps esoterically) use web browsers.

At any given moment I have 6 browsers open with maybe 700 tabs each open (I use addons to 'hang' them where possible, OneTab, etc. to try to keep them RAM stable, which is extremely difficult and frustrating even with 32GB RAM, the programs leak memory and just aren't designed for this usage in general.

I have this many tabs for a variety of reasons:

A. I use the web A LOT. I visit hundreds of pages on dozens of different topics/ongoing research, reading, work, etc, etc etc.

B. I use the open tabs as an open workspace of sorts.

E.g. I wanted to buy a backup HDD. I had about 15 tabs open for that, investigating failure rates for different companies, different prices, user reviews, technical documentation, HDD vendor websites. Having to constantly reverse, forward re-link, etc navigation-wise is just way too slow and frustrating for me. I'd rather just open lots of tabs. Now I didn't FINISH doing that investigation in one go.

(Perhaps that's my problem - I open a LOT of 'threads' and don't close them very quickly, I understand most people probably don't work like that - I do a lot of parallel processing, basically.)

For about 4-5 days those 15 tabs or so were open; eventually they were closed. But of course by then I had opened 30 tabs with research articles on them, 25 wikipedia pages on military history, 10 forum posts I was following, 15 news articles I wanted to read but didn't have time to (a very common reason I leave tabs open, (yes I know about "read it later" but I use the tab to REMIND me to read it later...)), about 50 tabs worth of things I was supposed to take care of, banking, payments, signup deadlines, etc., and a whole slew of other random minutiae all in ADDITION to my work, which is about 1000 tabs all on its own open at any given time, mostly web research.

Even THIS tab for instance, stays open while this thread is active. It's quite difficult to find it otherwise (I believe the forum software has an email feature, which is nice, haven't activated it yet, but frankly I find that annoying since I have to link-out from my email program when I could just keep it live).

C. So all that said - I get VERY upset when my browsers crash (they do this a lot), and all those tabs are either lost, or buried in history somewhere, or otherwise just not recorded even if I closed them and one day want to remember that I had been looking at them.

D. Hence, I want a robust way to keep a history.

E. In fact, I actually want more than that, but with Proxy-Offline-Browser (haven't tested it yet), I think I may have found the right tool.

Previously I used an Firefox addon called Shelve (amazing really), which auto-saved the complete HTML of every single page the browser visited (no performance issues) and essentially built my own local-copy of the web that I had ever visited, whether the page was still active or not. I loved it, but a few issues:

- Firefox only (I would use Fx only if I could but 4000+ tabs it cannot support so I need other browsers, not to mention we all know browsers can be buggy and sometimes you just need another one, in which case my record is incomplete).

- It saved the HTML files as their usual mess of images, etc in separate files. I configured it to save as the special Mozilla Archive Web Format which is a great single file webpage archive, very clean, beautiful, but unfortunately this led to a major performance delay on EVERY page, so I couldn't use it.

- I think ProxyOfflineBrowser could do this better and browser agnostically. Ultimately this is what I REALLY want. But I consider a URL log an additional record which is nice and cleaner to have. Also - it's easier to revisit the live page because the complete web archive is not easily searchable by URL.

F. Finally @RadiclDreamer (sorry you had to read through that), I am willing to spend for this, so sure I have a budget - Squid probably isn't a good idea given I'm not much good with UNIX and further, would rather spend money than learn to solve this particular problem.
 

CommissarMo

Junior Member
Feb 3, 2010
13
0
0
I tried Proxy-offline-browser and it's sort of useless. It crashes, slows everything down, needs Java virtual environment to run, and is very dated.
 
Last edited:

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,101
126
I think you can use Fiddler. It logged about everything. There will be tons of web history, so you might have to define filters.

Fiddler can be downloaded here. http://www.telerik.com/fiddler

It works on IE, Chrome and Firefox. It will install a FiddlerHook plugin in Firefox.
Make sure you don't have other proxies that redirect traffic, or you probably won't see any activity logged by Fiddler.

I used Fiddler once long time ago and I'm no expert on this.

The way you open tons of tabs is just absurd, however. It will eats lots of memory and every tab could load flash player with it. And I know flash crashes constantly, really.

I used uBlock plugin for Firefox https://addons.mozilla.org/en-US/firefox/addon/ublock/ to block unwanted ads, it's still preliminarily but works well for me.

There are some tutorials on Youtube for Fiddler, you might want to take a look.
 
Last edited:

CommissarMo

Junior Member
Feb 3, 2010
13
0
0
@mxnerd - While it's not exactly what I wanted (which was basically just a clean URL list like I can get in each browser history), Fiddler is pretty much the answer to my question!

1. I tried it and it's obviously quite sophisticated and complex to configure, and it generates a huge amount of traffic (I haven't figured out how to just get 'visited' URLs from it - though it does usually syntax them as blue which is nice.

2. Of course need to keep this running always, but it looks pretty lightweight. Presumably if I had a firewall appliance, I could put this on it and have it run external to my computer? I don't know if that's possible, but because now I'm interested I'll investigate that.

3. Fiddler seems a lot simpler than Untangle, though Untangle actually has a hardware appliance offering (much more expensive than Fiddler (free)).

4. Some quirks I've noted - Fiddler definitely does strip away HTTPS which was referenced above - this is cool.

OTOH, it seems a bit incomplete - e.g. I tested youtube.com which is HTTPS. It logs that easily (along with ads and other junk traffic).

But when I click on a video, it doesn't detect THAT specific URL (the browser history does and nicely creates a link with the title of that page), so I'm getting suspicious about its utility as a URL logger for my purposes anyway. Obviously THAT's the link I want, not all that other stuff it captures.

NOW - I also tested a hunch - when I go up to the URL bar and hit enter, which loads the specific HTTPS link that YouTube assigns to a specific video, Fiddler immediately logs it and highlights it as a webpage.

5. (Another completely bizarre quirk I cannot figure out at all... visit a youtube video, Fiddler doesn't log (I waited thinking maybe decrypting HTTPS takes awhile but doesn't seem to). BUT when I then visit the web history page, it immediately logs that specific open link. - I have NO idea what's going on there).

6. I will definitely be using Fiddler as part of my logging/web history/archive arsenal - I've realized that this project of logging websites will likely be a patchwork effort just creating a huge amount of log material and archived websites, which I'm fine with. I have lots of space to store it lol.

*I had actually tried Fiddler a few days ago before you had posted here, seeing it recommended on StackExchange for someone who wanted to solve a similar problem, but I originally used the tiny WebCap applet they have instead of the full Fiddler program - when you recommended it, I downloaded that and it's MUCH better than the applet, so I changed my mind. Thanks for the recommendation!

7. What's great about Fiddler also is that it captures every browser and tags the browser which visited the URL - that's exactly what I wanted as far as that goes since I have so many browsers.

8. I've also come to another realization as per my overarching interest in saving an HTML copy of EVERY website I visit as part of my personal web history/archive.

I think that perhaps manipulating the browser's web history, if possible, might be better place to start with that rather than using these other programs like Web crawlers and Proxy off-line browser.

Ultimately, I really just want the web histories the browser itself generates, I like them, and I just want the computer to auto-save an HTML copy of every page on that list. I have to imagine (if I knew how) one could write a plugin that does this.

*I'm being a little facetious since Shelve AddOn for Firefox which I've used a lot does EXACTLY this but it saves the webpage live as you visit it (which is I suppose even better than using the web history since they can be a bit incomplete sometimes), and it doesn't slow the browser down at all. Shelve is a bit messy though, and doesn't create the nice folder of single-file HTML/MAFF/MHTML files that I could create manually by clicking 'save as' on EVERY page I visit by hand (which is what I want).

Then of course main problem is Shelve is Firefox only. I suppose I could just use ONLY Firefox (can't do that lol), or... keep looking.

9. This is off-topic now since Fiddler is the answer to the forum question, but as for archiving, I've considered exporting the web histories and feeding them to the crawlers (HTTrack is quite nice crawler actually), and while this does work, it's a LOT of manual work (maybe in theory the steps can be automated with a task software). It also winds up saving a LOT more stuff than just the pages visited when you set the depth of crawl to 1 (you can't set it to zero because many sites post images and stuff as links so they get missed at crawl depth zero).

10. Ultimately, I suspect using crawlers will be more trouble that they're worth.

11. I think I need to explore the browsers themselves more, (perhaps learn how to code add-ons, or pay someone to build one for me lol, or perhaps modify Shelve - the sourcecode is available for it on SourceForge to be more friendly).

12. Presumably if it can be done on Firefox, addons that do the same thing could be written for Chrome and Opera at least.
 

mxnerd

Diamond Member
Jul 6, 2007
6,799
1,101
126
Fiddler does have options to run at another machine, I think. Under Tools menu, Fiddler Options, Connections, there is an option "Allow remote computers to connect".
 

sayerlep

Junior Member
Aug 6, 2017
1
0
66
CommisarMo, did you figure out a solution to your problem? I'm looking for the exact same thing.