• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Website scraping question

Sureshot324

Diamond Member
I want to write a scraper for the Battlefield: Bad Company 2 stats page, www.badcompany2.ea.com/leaderboards. Basically I want to make my own personal stats page. The problem is I can't figure out how to see the correct html source file.

When you first load the page, it shows the top 20 PS3 users. From there you can switch to PC or xbox 360, or search for a specific user, which loads the same page plus that user at the top of the list. The problem is if I do this and go 'view source' from firefox or IE, it just shows the source for the original page with the top 20 PS3 users.

I can't figure out how it's doing this. I can be looking at a page with the top 20 PC or xbox360 users, but if I go view source it just shows the PS3 users. Shouldn't it show me the HTML for the page I'm looking at right now?

Any web programming experts out there want to look at this page and figure out what's going on?
 
Did some reading and I think the reason it's doing that is because it's an AJAX site. Still trying to figure out how to parse it.
 
You have to find out what webservice it's calling and what arguments it passes, then parse the results.
 
The sample parameters I provided are JSON. (But is largely irrelevant considering everything basically gets passed as GET parameters)

It's not clear what point you're trying to make.

No, it's not irrelevant. If the site returns JSON then you don't need to scrape anything. You just need to parse the JSON which is returned.
 
No, it's not irrelevant. If the site returns JSON then you don't need to scrape anything. You just need to parse the JSON which is returned.

Oh, I see what happened. You quoted me but you meant to quote someone else. No where in my post did I mention scraping or AJAX being needed.
 
Ok so I am able to get the persona (user id) from the username by going to this link:

"http://www.badcompany2.ea.com/leaderboards/ajax?platform=" . $platform . "&sort=score&start=1&search=" . $username);

Once I have the persona I can get a players stat page from this link:

http://www.badcompany2.ea.com/stats?persona=234354084&platform=pc

This page has most of the stats, but to get your stats on each individual gun, you have to click it and it dynamically loads it. More ajax I'm guessing and I'm stumped on how to get that data. What I really don't understand is when I look at the source, the link for each gun is <a href="#".... How can every link point to just # yet they all load the stats for different guns?
 
Notice the other link properties?

<a href="#" id="ul_m416" class="weapon">

..., for example. They probably added JavaScript to be called by the OnClick event for every link of class "weapon".

Also notice the stuff inside the <span>s? It looks like it parses to something; probably the HTML on the right side, but I'm not entirely sure.
 
Notice the other link properties?

<a href="#" id="ul_m416" class="weapon">

..., for example. They probably added JavaScript to be called by the OnClick event for every link of class "weapon".

Also notice the stuff inside the <span>s? It looks like it parses to something; probably the HTML on the right side, but I'm not entirely sure.

You are correct, all the stats are in that span tag. That's gonna be annoying to parse. I didn't know you could set javascript to run on everything with a certain class. Would this be in the css file?
 
I didn't know you could set javascript to run on everything with a certain class. Would this be in the css file?

Um, no, it's in JavaScript, not CSS. ^_^

I've seen jQuery commands that could do it. At a basic JavaScript level, I imagine it's just a case of iterating over all objects with a certain class and adding that event handler to each object.
 
Back
Top