Possible Networking Program Design :: Winsock

kuphryn

Senior member
Jan 7, 2001
400
0
0
Hello.

I have an idea for a program I would like to design and implement using C++. This program will produce a list of websites based on what the user wants to search. For example, let say the user enters "c++ programming." The program will log onto Google.com and conduct a search for "c++ programming." It will browse all responses pages and save all websites into a text file (one line per website URL). One done, the user will have a list of relevant websites.


From an implementation perspective, is the program above simple enough using C++ and Winsock? I should be able to setup a socket to connect to a search engine such as Google and/or Yahoo. That is about all I know right now. I do not know how to gather the information after I have connected to, say, a website. Please include a possible implementation using Winsock if you know of one.

Thanks,
Kuphryn
 

Pakaderm

Senior member
Mar 8, 2001
519
0
0
This would be very easy to do. Look into using the WinInet library. It's a level of abstraction higher than Winsock.

-Pak
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
This is known as "scraping", writing code to extract info from web server pages.

Basically you do:

Http POST to fill out search box
buffer and process response page 1 (sets #pages or isMore flag)

while (isMore)
{
Http POST or GET to move to next result page
buffer and process response page X
}

though you'll also want a limit stop of say 500 results since there could be
hundreds or thousands of pages of results.

Use "view source" in a browser to figure out what to send to google and whether to use POST or GET for moving to the next page.

I think Google has a more direct API, but the above technique would work for any search engine, including ones for programming sites like CodeGuru

 

kuphryn

Senior member
Jan 7, 2001
400
0
0
Thanks.

Another member posted at CodeProject and Pakaderm mentioned the use of WinInet. Does the technique you described with the use of HTTP POST make use of Winsock or WinInet?

If WinInet, I have never heard of WinInet. Is that part of MFC and IE? The authors of the network programming (Windows) I am reading have not mentioned WinInet.

Kuphryn
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
POST and GET are standard web protocols, the "file formats" for info transfer to and from servers on the web.

The simplest GET is to connect to a web server on port 80 and send this with winsock:

GET / HTTP/1.0

which would request "/" (the index page from the server.

Connect to the Anandtech main site on port 80, then send this to request page 1 of the 845 motherboard roundup:

GET /mb/showdoc.html?i=1567 HTTP/1.0


Wininet (as msdn.com will tell you) is a layer on top of winsock that has functions to do some of this for you, such as doing a GET by just passing the server and "/mb/showdoc.html?i=1567" to a function.