OK here's what I need to do. I'm wondering if you guys can help me figure out the best way to do it. I have limited experience in perl and python, and good experience in C++. I feel comfortable learning languages quickly though.
I need to access a particular URL which is in the form of www.domian.com/query?page=i. Follow every link on the page with the format "www.domain.com/directory/page" where domain and directory are constant but page is different every time.
Once I get the new page I need to extract a particular table and place the information in the tr and th tags into an excel (or csv) file.
Once I finish this I need to increment i and retrieve a new page and start the whole process over again.
I'm thinking the best way to do this is in perl as parsing the files shouldn't be too hard, but if I go this route can someone reccomend to me a way to retreive/spider the pages?
If there is an easier way to do this then perl I'm definitly open to ideas as I'm largely going to have to relearn most of the perl syntax anyway.
Merry Christmas
-Bob
I need to access a particular URL which is in the form of www.domian.com/query?page=i. Follow every link on the page with the format "www.domain.com/directory/page" where domain and directory are constant but page is different every time.
Once I get the new page I need to extract a particular table and place the information in the tr and th tags into an excel (or csv) file.
Once I finish this I need to increment i and retrieve a new page and start the whole process over again.
I'm thinking the best way to do this is in perl as parsing the files shouldn't be too hard, but if I go this route can someone reccomend to me a way to retreive/spider the pages?
If there is an easier way to do this then perl I'm definitly open to ideas as I'm largely going to have to relearn most of the perl syntax anyway.
Merry Christmas
-Bob
