Send post-data with wget only once

grumm3t

Member
Oct 22, 2001
114
0
0
I'm trying to mirror a site with a login using wget and post-data but an receiving error 405 codes...

The site doesn't use cookies to handle logging in, so I have to pass my username/password with the --post-data option to the login script. This seems to work fine and it starts mirroring the page, but then ends abruptly.

Here's an example retreival from wget with the headers:

--00:54:05-- http://www.somesite.com/some/file.gif
=> `www.somesite.com/some/file.gif'
Connecting to www.somesite.com[xx.xx.xx.xx]:80... connected.
HTTP request sent, awaiting response...
1 HTTP/1.1 405 Method Not Allowed
2 Date: Mon, 18 Apr 2005 06:16:02 GMT
3 Server: Apache/1.3.31 (Unix) mod_ssl/2.8.20 OpenSSL/0.9.7a mod_perl/1.29 PHP/4.3.8
4 Allow: GET, HEAD, OPTIONS, TRACE
5 Connection: close
6 Content-Type: text/html; charset=iso-8859-1
00:54:06 ERROR 405: Method Not Allowed.

Via google I found the problem was probably related to sending post data on every hit. So here's my question:

Is there a way to have wget send the post-data string only on the first page (login script) retreival?

Edit: clarity
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Forgive my lack of knowledge about wget, but I'll try to help. How did you get wget to do a POST in the first place?
It looks like static files can only be retrieved with GETs so if you have a list of such files you can just get wget to do a GET for those (sorry, couldn't resist :p).
 

grumm3t

Member
Oct 22, 2001
114
0
0
Wget works the same way your web browser would work. It initiates a handshake with a webserver to send and receive data. All the information you fill out on a site in your webbrowser submits to the server the same way wget submits post-data (at least as much as the web server can see).

The bits and pieces sent to and from the server are nearly indistinguishable from program to program, the differences arise in the way the program manipulates the data.

Edit: spelling
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
So wget actually parses an html file and submits a form based on the info in the html?

The only experience I have with it is one-offs from the commandline. Like:
sends this out:
GET /i/authorsicons/crab.gif HTTP/1.0
User-Agent: Wget/1.8.2
Host: forums.anandtech.com
Accept: */*
Connection: Keep-Alive

I'm just trying to understand how you're using wget differently to cause it to use a POST to grab an image...
 

manly

Lifer
Jan 25, 2000
13,086
3,850
136
Originally posted by: grumm3t
Just stores the info on a server side cookie with my session id, I assume.
Now you're contradicting yourself. :p I'm pretty sure you can submit cookies using wget. You can "borrow" cookies from your regular browser.
 

grumm3t

Member
Oct 22, 2001
114
0
0
It's stores a server-side cookie, not a client side one. But I guess you can't really call it a server side cookie. It's just the session info made by PHP.

Edit: Missed this post sorry...

I'm just trying to understand how you're using wget differently to cause it to use a POST to grab an image...

Well I viewed the source and got the names for the login form (which is memberid for username, password, and action [which is login, this is a hidden field used commonly in server side scripts]). Then I want it to login with my user details by "faking" a form submit to the page that processes logins. The login page forwards me to my member page and I've given access to for-pay parts of the website.

Now I want wget to mirror everything it can find on the page [including member sections], but it only seems to do 1/20 of the content and then says it finishes.

I think the problem deals with when I tell wget to submit the form data to the login page, it does so, but then submits the login details to every page it tries to grab from the site. As a security measure the owners blocked POSTing data to a page that doesn't need it, so wget is denied access to most of the stuff.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Well, that would mean that you probably don't need to be logged in to get at the static data (like images). If you have a list of the files you want, can't you just put them in a text file and then have wget loop through and retrieved them all (or write a script to make wget do one at a time)?