Web Proxy Question (Java)

imported_nautique

Senior member
Jul 14, 2004
346
0
0
I have to create a logging web proxy. I have to implement URL - based filtering and logging. The sender will run a basic java call with a specified port number.

Anytime a browser connects and sends a GET, HEAD, or POST HTTP request I have to check to see if the URL is in the request log.

My question is that to begin this, do I need to implement the client side class that creates a socket and then use the Logging that Java provides?

As you can probably tell Im a little lost on where to begin. So any suggestions would be great! Im just trying to get started on this but am having trouble on where to begin.

Thanks.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
You're just implementing a standard web browsing proxy? You probably don't have to create a client as a browser will do that just fine, or use telnet (which is really just a plain tcp pipe) while you're testing but a custom client may come in handy for automating testing.

For logging I don't think jdk logging is what you want as it isn't meant to be read back by the application. For the big solution, a database might be most appropriate for storing records of what's been requested but an in-memory structure might do fine if you're not worried about being fancy.

The place to start, though, is figuring out how to handle your http requests. If I was starting from scratch and had no limitations on what I was allowed to use I'd maybe start using tomcat to handle all the nitty-gritty and just write servlets to do the actual work. Or you could write your own server architecture (deal with sockets and such) and import some kind of http handling library from somewhere. Or you could get reading and implement your own http protocol handler. Any way you do it, learning http is a good idea. I'd reccomend the LiveHTTPHeaders extension for firefox if you want to see how it works in everyday scenarios.
 

znaps

Senior member
Jan 15, 2004
414
0
0
As Kamper said, I think it'll just involve a Servlet, shouldn't be too hard - I don't see why you'll need a Java client as any browser can test it.
 

imported_nautique

Senior member
Jul 14, 2004
346
0
0
Ok I now realize that I am not writing the client side but just using the browser as my client. I am starting to write a server side java file as the proxy.

All I want to do first is just get to the point where I print out all of the header information about the website in DOS. I have tried writing the code but for some reason it will not print out to the command prompt.

I create a bufferedreader object and a printwriter. Then have a while loop but i am not sure how to break out of the while loop. Maybe that is my problem??
 

imported_nautique

Senior member
Jul 14, 2004
346
0
0
public class myProxy_1 {
public static void main(String[] args) throws IOException
{
ServerSocket serverSocket = null;
Socket clientSocket = null;

try
{
serverSocket = new ServerSocket(8080);
}
catch(IOException e)
{
System.err.println("Could not listen on port: 8080.");
System.exit(1);
}

try
{
clientSocket = serverSocket.accept(); // start listening on port for client
}
catch(IOException e)
{
System.err.println("Accept Failed.");
System.exit(1);
}

PrintWriter output = new PrintWriter(clientSocket.getOutputStream(), true);
BufferedReader input = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));

String inputLine, outputLine;

while((inputLine = input.readLine()) != null)
{
outputLine = inputLine;
output.println(outputLine);
}

output.close();
input.close();
clientSocket.close();
serverSocket.close();
}
}
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Well, first of all, you're not seeing anything on the command line because you're not printing anything on the commandline. You're just writing it back to the socket.
 

imported_nautique

Senior member
Jul 14, 2004
346
0
0
I thought that the output.println(outputLine); prints it to the command prompt.

And if i don't check for null what do i check for because the server sends back the header information right?
 

znaps

Senior member
Jan 15, 2004
414
0
0
The output.println sends it back to the browser (which might be fine actually since you still get to see the output).

How about sticking some debugging System.out calls so you can work out for yourself what info is being sent?
 

imported_nautique

Senior member
Jul 14, 2004
346
0
0
Everywhere throughout the code or just in the while loop....that was a great idea too thanks...i will try that and let you kow what i come up with
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Originally posted by: znaps
The output.println sends it back to the browser (which might be fine actually since you still get to see the output).
I doubt you'd see anything. The browser should barf because there'd be no proper response headers.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Originally posted by: nautique
I thought that the output.println(outputLine); prints it to the command prompt.
System.out.println(outputLine);
(as znaps said) You can have as many PrintWriters as you like but only one of them prints to the console.

Edit: whoops, didn't see that you had picked up on that advice already...
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Originally posted by: nautique
I thought that the output.println(outputLine); prints it to the command prompt.

And if i don't check for null what do i check for because the server sends back the header information right?
This is the difficulty of writing an http server. You have to rely on the client sending the right stuff (ie. properly ending the header section with a \n\n and correctly specifying Content-Length) and such. If it doesn't then you have to include error handling code to recover and pretty much the only way to do that is to have your socket time out after a while.

That's why I reccomended using a prebuilt http library if you can.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Originally posted by: znaps
No it works, at least on Firefox. I tried it myself out of curiousity.
Interesting. So you just get it on the screen as if it were the contents of a text file? I'd be curious to see what LiveHttpHeaders has to say about that.
 

znaps

Senior member
Jan 15, 2004
414
0
0
Yep. I just realized that in Firefox when you do a View Source, it attempts to reconnect to the server to grab the source....weird.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
I've never really figured out firefox's view source. I've tried to do "View Selection Source"s on pages that I got to via POST and it tries to redo the whole POST (very bad!) just to show me the source. It's almost like it doesn't have the source available after it's rendered it.
 

znaps

Senior member
Jan 15, 2004
414
0
0
I know..that's dumb - it's not like holding a page of text uses up much of any extra memory.
 

imported_nautique

Senior member
Jul 14, 2004
346
0
0
Ok....I was able to get it working by making the print statement to be System.out.println and it now prints it out in the command prompt. But now what must I do to exit the while loop correctly? I was thinking some kind of if statement with a break but i am not sure what to check for. Any help would be nice.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Originally posted by: nautique
Ok....I was able to get it working by making the print statement to be System.out.println and it now prints it out in the command prompt. But now what must I do to exit the while loop correctly? I was thinking some kind of if statement with a break but i am not sure what to check for. Any help would be nice.
That's the trick, you have to be understanding the http as it comes in to know when to stop. In the case of a simple GET, the request is finished when you see a blank line (two successive \n characters) but it's different for POST.
 

imported_nautique

Senior member
Jul 14, 2004
346
0
0
Now that I am able to read the message going out from the client to the proxy, how do I open port 80 up so that the proxy turns in to the client and asks the real server for the webpage? I need to now get the actual web page to display.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Well you could use sockets (in a similar manner to what you are doing right now) and that would give you the most control. Just implement the client side of the protocol like you're doing the server side already. You could potentially even just spit out the http request that the client sent you (I think).

I believe sun has an http client library that actually comes with the jdk but I believe it's hidden behind a few layers of stuff so you may not be able to exercise enough control over it. It might also not be a standard part of the jdk so that might make relying on it unportable :(. Worth a look anyways.
 

imported_nautique

Senior member
Jul 14, 2004
346
0
0
To implement the client side of the protocol, I can do that in the same file that I have done the server proxy already correct?