It's almost impossible to describe to a client any of what makes data on a given page easier or harder to get. They can see the data, and everything beyond that is spoken in klingon.
Hm, this might be of use. Maybe ask them to put tags like that around anything they'd use for a Microsoft Word form letter.
trust issues suck.
Unfortunately this particular gig is uninvited scraping, so we have to work with what we have. The data can be had, but obviously every site is different. Right now I'm dealing with one where I have to generate an ASP postback in order to navigate pages. You try describing this stuff to the client when the page looks just like the page you easily scraped yesterday, and their eyes just glaze over.
Why not? Can't you install MS Fiddler, use it to figure out the codes being sent and received via AJAX, and replicate them yourself? Or are they too complex? Or do they change format too frequently?you can't make calls to the endpoint
Why not? Can't you install MS Fiddler, use it to figure out the codes being sent and received via AJAX, and replicate them yourself? Or are they too complex? Or do they change format too frequently?
I have some experience with this in automated testing with HP LoadRunner. Last I checked (which was several years ago) the Virtual User Generator was free, and could occasionally be helpful in parsing data being sent back and forth as well.
Hm, LMGTFY.But the specific example I was thinking of is a single page jsf app, and the endpoints return "faces," whatever the hell they are. I have to dig into it and see if we can get something out of them.
Hm, LMGTFY.
"The Java EE 7 Tutorial:Using Ajax with JavaServer Faces Technology"
Looks like this might fit the example of "too complex".
You know, I shouldn't give the wrong impression, because this client is actually pretty easy going. It's more a recognition that in a lot of areas you can describe the challenging parts to clients in terms they understand, but when it comes to scraping it's frustrating, because on the surface it all looks the same, and right under the surface it all goes to hell . All of a sudden you're trying to explain how to trigger page navigation in an ASP.NET webforms site using a simulated postback, or you're explaining that the nice number the client can see on the page is actually pulled from an ajax call, and since you can't make calls to the endpoint you have to install phantomJS and run the javascript in a headless browser. Of course... then you have to explain endpoint, javascript, and headless.
It's almost impossible to describe to a client any of what makes data on a given page easier or harder to get. They can see the data, and everything beyond that is spoken in klingon.
They just tell me what page needs to be done and I tell them a price. They don't need to know details.
Depends on if they question you about the price. Especially if 1 page cost $1000 and and the other costs $10000.