Multiplayer GPU Benchmarking

VulgarDisplay · Feb 1, 2014

I didn't want this to get lost in the Mantle thread so I decided to start another topic.

With all the discussion about how MP testing is hard to reproduce and can't be trusted because of this I kind of thought about the problems and how they could easily circumvent them.

You would need two indentical test setups. Same CPU, MOBO, RAM, etc. Then the trick is going to be only testing two GPU's that are in the same price category. Say something like $100-150, $150-200, 200-250 up to the ultra high end cards.

Then you would need 62 volunteers to go into a server with the explicit instructions to NOT SHOOT THESE TWO PLAYERS. Those two players that are the benchmark systems then go to a place on the map. Stand right next to each other, and just watch the action. That would ensure that cards on the same price level always get the exact same benchmark run and fix the issues with variance. It would require more money for another test setup, but it would be the proper way to do the benchmark with no variance.

Thoughts? Suggestions?

Paul98 · Feb 1, 2014

VulgarDisplay said:
I didn't want this to get lost in the Mantle thread so I decided to start another topic.

With all the discussion about how MP testing is hard to reproduce and can't be trusted because of this I kind of thought about the problems and how they could easily circumvent them.

You would need two indentical test setups. Same CPU, MOBO, RAM, etc. Then the trick is going to be only testing two GPU's that are in the same price category. Say something like $100-150, $150-200, 200-250 up to the ultra high end cards.

Then you would need 62 volunteers to go into a server with the explicit instructions to NOT SHOOT THESE TWO PLAYERS. Those two players that are the benchmark systems then go to a place on the map. Stand right next to each other, and just watch the action. That would ensure that cards on the same price level always get the exact same benchmark run and fix the issues with variance. It would require more money for another test setup, but it would be the proper way to do the benchmark with no variance.

Thoughts? Suggestions?

I had the same sort of thought myself, or maybe could use spectator mode so you could go to the exact same spot, and even move through the map if you wanted. I would have to check to see how much power the spectator mode used vs normal playing.

BrightCandle · Feb 1, 2014

Its not sufficient to be genuinely reproducable. The players themselves are doing different things resulting in different actions and effects being drawn on the test machines resulting in differences. Its possible that high ping players could impact this more or less than lower ping ones as well. The exact details of who is where, with what vehicle and weapon and the size of the firefight and how occluded it is all play into it.

The strategy you have described is probably somewhat how they test - they go somewhere safe and look in the same direction and hope that this sufficient to be reasonable repeatable.

Actually the only repeatable way to do this is record the actions of the remote players, so what you do is join up to the game as your test client, then you run the script for all the remote player movements and capture the FPS as it happens. Effectively you are completely simulating the network communication of the server to fool the client into thinking its playing a game and thus it produces reliable results. Its the only way to get a genuine reproducible test that is also multiplayer with no extra variance added due to the game playing out differently.

The problem with the approach that would actually work is its impossible without the developers help. They would need to make the record and playback functionality to trick their game into believing its networked and playing within a particular game. More than that ideally the game should record the users movements along with it so you can basically play a multiplayer scene and then use it to benchmark from then on, with shooting and movement and everything else being part of the test. But can't be done without the developers making it possible.

Erenhardt · Feb 1, 2014

Very scripted server side bots, that each round do exactly the same things. Then you stand in the same spot, looking in the same direction. Bots need to ignore player.
But that would require dev help or quite a modding

blackened23 · Feb 1, 2014

Sounds like a logistical nightmare waiting to happen. I think the real issue here is trying to create a situation, through multiplayer, that creates a CPU limited gaming scenario.

To that end, you don't need multiplayer to create a CPU limited scenario. It is entirely possible to do that via single player as well.

I'm not saying it isn't a question worth asking. But, I'm just saying...get 60+ players all on the same page to act like robot A.I.? Good luck with that. It would be incredibly tough to organize and even harder to execute. Nearly impossible. That said , many websites *do* test multiplayer yet they take an extended gameplay sample of 1+ hour so that everything, generally speaking, averages out. Is that a valid testing method? I don't know. But either way is not ideal. It's a tough question, really. I don't know what the answer is. Replication is darn near impossible in MP runs, especially those involving that many people.

DiogoDX · Feb 1, 2014

2 pcs > enter spectator mode > first person view of the same player > benchmark

DarkKnightDude · Feb 1, 2014

DiogoDX said:
2 pcs > enter spectator mode > first person view of the same player > benchmark

Problem is, that if there's two different things going on in front of the spectated players then they'll get different results.

Grooveriding · Feb 1, 2014

There is actually a fairly consistent way to do this. Make sure you are on a full server and at the start of the match spawn in a vehicle and do a circuit of the map. I found a tank is best in BF, shoot at some objects as.you go around and make sure you take note of everything you are doing. Repeat this for each run.

Most of the intensity of 64 player is not direct on screen contact with other players, but a big explosion or fire fight on screen will have a big impact if it happens of course, but most of it is that the entire time your system is taking in information on where every player is, what they are doing, how it is affecting the environment and physics calculations etc. This does not have to be in your FOV, it's all still being sent and processed by your machine.

If you take care to be consistent in your path, and do it on a fresh map each run, you can get consistent results.

I did this with BF3 when I went from x58 to x79. It worked quite well as the results showed consistency in rises and falls in framerate at the same times during the respective runs.

http://forums.anandtech.com/showpost.php?p=32662392&postcount=15

There are sites that make a point of doing good multiplayer runs, some even provide videos of what they benched.

VulgarDisplay · Feb 1, 2014

I should have clarified that the two players would stand shoulder to shoulder right next to each other and stare and the exact same spot watching the action. At no point would these two machines receive any input during the benchmark run. Just both watch the same match unfold.

PPB · Feb 1, 2014

The problem is, which spot to decide?

There are one more CPU demanding, other ones are GPU limited. Biased reviewers would easily chose the spot that fits more adequately to their agenda.

My opinion is simple: the greater the sample, the less dispersion. Just play the damn game 2 hour straight on both setups on a 24/7 map server. In BF4's case, just decide to stick to a role (people that fly get far greater fps than people on land, for example) and be done with it.

VulgarDisplay · Feb 1, 2014

Honestly I think in BF4 the best place to test would be to have the two test machines go on the RCB in Siege of Shanghai. Park it just outside the area where the skyscraper falls. Then get your 62 volunteers to fight solely on the middle flag where the skyscraper is with the instructions to take the skyscraper down during the course of the fight.

Turn the boat pointing away from skycraper. Have the two test machines turn the minigun turrets back towards the fight and aim just as the shore. Both machines would hold down the fire button on the minigun turrets aiming at one spot (add some more stress to the test) and continue firing throughout. Just have all 62 players basically play TDM on that skyscraper area with tanks and choppers fighting in that area as well.

62 players in this area all in front of the test machine fighting would stress all aspects of the game. Especially when the skyscraper comes down.

Replication is not necessary because you will have an nvidia rig and a amd rig from the same price point watching the exact same match. You could do more runs just to get an average, but it would be possible.

I think you could easily get 62 volunteers to help with this for a site like anandtech.

sandorski · Feb 1, 2014

Input Capture(kb/mouse) from X amount of people playing on the same Map at the same Time. For the Test, stream the Input Data to the server the Tester is spectating.

VulgarDisplay · Feb 1, 2014

Spectating could also work, but you would need a third test set up to actually play on as a baseline to see if the spectating machines are seeing the same stresses.

DiogoDX · Feb 1, 2014

VulgarDisplay said:
Spectating could also work, but you would need a third test set up to actually play on as a baseline to see if the spectating machines are seeing the same stresses.

I think the servers support 4 slots for spectator. So in theory can be done.

BTRY B 529th FA BN · Feb 1, 2014

How can you claim something over one point of view when most everyone won't go through the same scenario? It's not even good stats for a general idea of performance.

PPB · Feb 1, 2014

VulgarDisplay said:
Honestly I think in BF4 the best place to test would be to have the two test machines go on the RCB in Siege of Shanghai. Park it just outside the area where the skyscraper falls. Then get your 62 volunteers to fight solely on the middle flag where the skyscraper is with the instructions to take the skyscraper down during the course of the fight.

Turn the boat pointing away from skycraper. Have the two test machines turn the minigun turrets back towards the fight and aim just as the shore. Both machines would hold down the fire button on the minigun turrets aiming at one spot (add some more stress to the test) and continue firing throughout. Just have all 62 players basically play TDM on that skyscraper area with tanks and choppers fighting in that area as well.

62 players in this area all in front of the test machine fighting would stress all aspects of the game. Especially when the skyscraper comes down.

Replication is not necessary because you will have an nvidia rig and a amd rig from the same price point watching the exact same match. You could do more runs just to get an average, but it would be possible.

I think you could easily get 62 volunteers to help with this for a site like anandtech.

I think you forget that an important aspect of the benchmark is to replicate the experience a user would get, with its maximum, minimum and medium framerates/frametimes.

What your propose doesnt hold any value in that sense, the spot position to watch becomes arbitrary. For example, the most CPU demanding area in that map is when you are watching D node across the north street. If we did your kind of test you would miss that scenario entirely.

The gameplay experience (unless you are one of those [Redacted] snipers) involves stuff blowing up in your face, things crashing right next you, along with your FPS sometimes. Heck, if you played the game enough, you can tell that even server performance can make your FPS go down, by some reason that is beyond me (the first game I have ever seen that server performance affects client performance, FPS wise, you can get periods of low fps randomly in some servers, while in others you dont).

There are so many variables, and you need to understand that those variables will also affect the people this will game play, that your bench should also reflect those. That's why I propose bigger samples in benching MP, at least in this BF4 game. But always playing the game, not expectating it.

Deders · Feb 1, 2014

You'd have to rule out ping and make it a best case scenario that takes into account people's actions in game, maybe macro recording on a 64 player server that interprets player's macro's as instructions to get the same result every time.

parvadomus · Feb 2, 2014

Servers should be able to record a demo (the clients inputs over time), and then use it in the test system.
This way you get a consistent way to bench it and you avoid network time.

cmdrdredd · Feb 2, 2014

The best way would be like was mentioned, record a fully loaded game for a couple minutes then add it into a batch file that will do a run through and report the performance. Like they used to have for Quake back in the day.

VulgarDisplay · Feb 2, 2014

I really don't think people understand what I'm saying...

You would put the two benchmark players RIGHT in the middle of all the action and have everyone ignore them. They stand shoulder to shoulder and just watch 62 people going nuts directly in front of them. The test setsups shoulder to shoulder would be staring at the exact same spot and then not be moved at all until the benchmark was done.

BrightCandle · Feb 2, 2014

But the action would be different every time resulting in different performance levels. Its not sufficient to make a repeatable test.

VulgarDisplay · Feb 2, 2014

BrightCandle said:
But the action would be different every time resulting in different performance levels. Its not sufficient to make a repeatable test.

That's why you would only benchmark two cards ont he same price level against each other. I would do gtx780TI vs 290x, 290x vs gtx780, and gtx290 vs gtx780 on the high end, and then it gets easier across the price points from there. The 780ti is off in it's own little world when the radeon prices aren't inflated.

This type of testing would only be done across price points to give buyers in that price range the information they want. It really does not matter if a gtx780 or 290x got a different benchmark than a 7750 or gtx650. They are for completely different buyers.

BrightCandle · Feb 2, 2014

But they are getting different data. They are getting similar data but they are not standing at exactly the same spot or getting the data in the same way. It might be good enough within a margin of error but its not repeatable, and I think loosing comparisons to other cards is a bit of an issue. Typically people look at 780/780ti/290/290X in a bracket to see if the upgrade is worth it and then others are looking 270/280/760/770. Its better than what they currently do but its also more expensive (need 2 identical rigs) and less accurate than the ideal.

Multiplayer GPU Benchmarking

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member

Golden Member

Diamond Member

No Lifer

Diamond Member

Senior member

Lifer

Golden Member

Platinum Member

Senior member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member