(Discussion) Futuremark 3DMark Time Spy Directx 12 Benchmark

Thala · Jul 18, 2016

Hitman928 said:
Ok, then we'll make it straight forward, out of the 52% gain from openGL to Vulkan that computebase.de showed with the FuryX, roughly how much of that gain is from async, how much from instrinsic shaders, then how much from multi-core rendering, etc?

Oh and most important, how much from Vulkan vs OpenGL (in particular from not using the OpenGL driver from AMD, which seems to be not particularly optimized).

Check this comparison:
https://www.youtube.com/watch?v=P_I8an8jXuM

Apparently there is much more to OpenGL vs Vulkan/DX12 which is impacting performance than just "intrinsic shaders".

I'd say it is impossible to break down sources of performance gains and anyone claims the contrary is just spreading FUD. (Aside of course ID software, which most likely know the break-down more precisely)

dogen1 · Jul 18, 2016

Hitman928 said:
Ok, then we'll make it straight forward, out of the 52% gain from openGL to Vulkan that computebase.de showed with the FuryX, roughly how much of that gain is from async, how much from instrinsic shaders, then how much from multi-core rendering, etc?

IIRC, ~10% from async compute. Maybe more for the Fury and Fury X.

Bacon1 · Jul 18, 2016

Lots of in depth discussion here: http://www.overclock.net/t/1605674/computerbase-de-doom-vulkan-benchmarked/500

Looks like people have confirmed with GPUView that the "Async Compute" is nothing more than Pre-Emption based and tailored to NV hardware not AMD. Coupled with the remarks from FM saying they don't have different paths even though DX12 needs specific ones because drivers aren't as important, means it is a pretty heavily NV biased test

. Hopefully we hear some more info or they add in separate paths, otherwise Ashes is still better benchmark of true DX12 engines.

Riek · Jul 18, 2016

dogen1 said:
IIRC, ~10% from async compute. Maybe more for the Fury and Fury X.

how do you come by these numbers?

If you go by the difference between smaa and tssaa you should also take the initial difference of those two into account.

on opengl TSAAA is almost 10% slower than SMAA.

In vulkan it is almost 10% faster than SMAA.

So the difference is > 15%

Hitman928 · Jul 18, 2016

Bacon1 said:
Lots of in depth discussion here: http://www.overclock.net/t/1605674/computerbase-de-doom-vulkan-benchmarked/500

Looks like people have confirmed with GPUView that the "Async Compute" is nothing more than Pre-Emption based and tailored to NV hardware not AMD. Coupled with the remarks from FM saying they don't have different paths even though DX12 needs specific ones because drivers aren't as important, means it is a pretty heavily NV biased test . Hopefully we hear some more info or they add in separate paths, otherwise Ashes is still better benchmark of true DX12 engines.

I don't think it's NV biased at all. AMD still benefits from what they're doing and actually benefits more than NV. Again, every benchmark is useless if you don't know what you're testing. This benchmark is fine (IMO) but you need to understand what it's showing you rather than just X score is bigger than Y score.

I am hoping that FM is working on a test that actually uses graphics+compute parallel work as that is already being used in next gen games but I understand why they limited their approach with this test and think it's fine. Although, I will state again, I do think they need to be a little more transparent in their support docs.

dogen1 · Jul 18, 2016

Bacon1 said:
Lots of in depth discussion here:

Looks like people have confirmed with GPUView that the "Async Compute" is nothing more than Pre-Emption based and tailored to NV hardware not AMD. Coupled with the remarks from FM saying they don't have different paths even though DX12 needs specific ones because drivers aren't as important, means it is a pretty heavily NV biased test . Hopefully we hear some more info or they add in separate paths, otherwise Ashes is still better benchmark of true DX12 engines.

All I see is a lot of noise and not much information.

Can you point out such confirmation?

Riek said:
how do you come by these numbers?

If you go by the difference between smaa and tssaa you should also take the initial difference of those two into account.

on opengl TSAAA is almost 10% slower than SMAA.

In vulkan it is almost 10% faster than SMAA.

So the difference is > 15%

I thought TSSAA was more or less equivalent the TSSAA in cost.

Btw, I said IIRC, not I tested such and such and came up with these numbers.

AnandThenMan · Jul 18, 2016

Hitman928 said:
I don't think it's NV biased at all. AMD still benefits from what they're doing and actually benefits more than NV.

This does not in any way prove that Time Spy is unbiased. Neither does it prove it is biased but what does suggest this is how the bench is approached async compute.

This has been posted before but is an excellent explanation of what is actually going on.
https://i.imgur.com/W01dMG6.png

Thala · Jul 18, 2016

on opengl TSAAA is almost 10% slower than SMAA.

In vulkan it is almost 10% faster than SMAA.

So the difference is > 15%

Wouldn't this make the difference >20%? Example:

SMAA 100%->TSSAA 90% -> async TSSAA 110%. Overall aync gain 110/90-1 = 22%

trinibwoy · Jul 18, 2016

dzoni2k2 said:
I agree with you. The problem I and from what I understand others have is what exactly is the objective of TimeSpy benchmark and it's AC implementation. If the objective is to test hardware ability to execute graphics and compute queues concurrently and in parallel fashion, then it obviously failed in that mission. So what is the point of TimeSpy AC then if one vendor serializes those queues which is basically the same as turning it off in custom settings.

If that was not the objective then what was? That's pretty much all I guess.

All benchmarks measure performance and the validity of the result. They submit work via standard APIs and expect a correct result. They do not measure or dictate "how" that result should be achieved.

Also, folks are arguing that async isn't valid unless tasks are run concurrently. That is completely false. Async is a logical separation of work. If AMD can run tasks in parallel their performance will go up. If nVidia can't then their performance would not.

It's that simple. Everything else is either due to people having a very poor understanding of the topic or intentional misinformation.

Bacon1 · Jul 18, 2016

Hitman928 said:
I don't think it's NV biased at all. AMD still benefits from what they're doing and actually benefits more than NV. Again, every benchmark is useless if you don't know what you're testing. This benchmark is fine (IMO) but you need to understand what it's showing you rather than just X score is bigger than Y score.

I am hoping that FM is working on a test that actually uses graphics+compute parallel work as that is already being used in next gen games but I understand why they limited their approach with this test and think it's fine. Although, I will state again, I do think they need to be a little more transparent in their support docs.

No because it makes Polaris / GCN look like they perform similar to Pascal, when GCN has much better "true" async compute potential.

dogen1 said:
All I see is a lot of noise and not much information.

Can you point out such confirmation?

http://www.overclock.net/t/1605674/computerbase-de-doom-vulkan-benchmarked/470#post_25357883

http://www.overclock.net/t/1605674/computerbase-de-doom-vulkan-benchmarked/490#post_25358182

are a few, worth reading the whole thing.

As I pointed out in the PDF from Nvidia's Developer portal, they recommend having two different coding paths because you can't optimize per architecture which is what DX12 allows you to do. Time Spy doesn't do that.

Thala · Jul 18, 2016

No because it makes Polaris / GCN look like they perform similar to Pascal, when GCN has much better "true" async compute potential.

Sure the benchmark does not reflect that as clearly as it could. On the other hand, even when taking Time Spy into consideration, it still shows that GCN gains about double the amount from async compute compared with Pascal. In fact Pascal gains are in low single digit range. In summary even if you take Time Spy as reference, it does not suddenly turn Pascal into a good DX12 performer.

Riek · Jul 18, 2016

Thala said:
Wouldn't this make the difference >20%? Example:

SMAA 100%->TSSAA 90% -> async TSSAA 110%. Overall aync gain 110/90-1 = 22%

yes but the 10ù was rounded upwards (almost 10%). Its actually more around 8% and to give some error margin as well I rather say > 15% then to assume a much better ideal situation.

Thala · Jul 18, 2016

yes but the 10ù was rounded upwards (almost 10%). Its actually more around 8% and to give some error margin as well I rather say > 15% then to assume a much better ideal situation.

Even calculating with 8% would make the performance gain very close to 20% (19.6%).

In addition, from what i see the 10% gain SMAA no async vs TSSAA async is already generously rounded down.

SPBHM · Jul 18, 2016

Det0x said:
Didn't Futuremark do exactly this in the past ? "Bolting on" PhysX into the benchmark ?

If you were using a Nvidia card, you have to option to choose if you wanted PhysX to run on the CPU or on a NV graphic card ?

Something which boosted scores from a specific vendor..

While if you were using a AMD card, your only option was to run it on the CPU. (or use the amd+nvidia PhysX hack)

the whole thing happened mostly before PhysX was owned by Nvidia (Futuremark were working with AGEIA before it was acquired by Nvidia), lots of games were using PhysX at that point (mostly on the CPU only, with a few using accelerated PhysX via the AGEIA PPU cards), with later updates I also think Vantage changed from having PPU enabled by default to off, so the default settings for it were in the end CPU only for any GPU

Det0x · Jul 18, 2016

SPBHM said:
the whole thing happened mostly before PhysX was owned by Nvidia (Futuremark were working with AGEIA before it was acquired by Nvidia), lots of games were using PhysX at that point (mostly on the CPU only, with a few using accelerated PhysX via the AGEIA PPU cards), with later updates I also think Vantage changed from having PPU enabled by default to off, so the default settings for it were in the end CPU only for any GPU

I will copy some posts from a other forum :whiste:

That would be a major dilemma. Last thing benchmark program should do is create separate optimized path for each GPU.

This is exactly the dilemma I expected from them. There's no way to have a single render path in a DX12 benchmark without optimizing it for the lowest common denominator and punishing the silicon with extra features.

"Impartial" benchmarking has become an oxymoron with DX12. You have to optimize for each vendor or you're unfairly punishing one of them. It just about makes the whole concept of "benchmark" meaningless.

They had no problem doing this with tessellation. Now suddenly they've got morals?

I'd say there's a difference between doing the same workload (serially vs in parallel) and actively reducing the amount of workload with tessellation (geometry) is there not? Or am I not understanding this correctly?

I get you, but DX12 is not a one-size-fits-all API. Arguably DX11 was, but AMD suffered with high tess and had driver optimizations to keep such punishment within architectural limits. These driver optimizations became invalid within 3dmark, so they were left competing one-for-one with Nvidia.

OK 3dmark, that's fine if you want to look neutral, but now with DX12 AMD isn't allowed to shine with its parallel hardware-- it must remain on a level playing field with an NV-optimized render path. It's not an indication of game performance, unless that game is specifically NV-optimized and has very few if any AMD async shader optimizations.

See the theme here? The last 3dmark was NV-optimized with tessellation levels. The limitation was on the AMD side, and the fix was ignored / bypassed. This 3dmark is NV-optimized in its avoidance of Async Compute + Graphics, aka Async Shaders. The limitation is on the Nvidia side, and the fix is honored.

It's a valid benchmark as long as AMD knows its place.

This sums up my views pretty much

*edit*

With the given evidence, we can say that Time Spy benchmark, intentionally or not, by design, fits perfectly for the capabilites of Pascal, other Nvidia architectures are not capable of async computing at all, and most of the AMD architectures in theory are left with spare room to be requested of much heavier async computing loads.

It's like Tessellation loads were designed to fit the inferior AMD capabilities back in the day. There is a clear pattern with Futuremark controversies regardless of who's on the right or wrong, and it's that they always favor Nvidia.

btw 3DMark time spy was demonstrated for the first time at the GOC Asia Nvidia Event.

https://www.youtube.com/watch?v=kOsxV4-oRNA

dogen1 · Jul 18, 2016

Det0x said:
With the given evidence, we can say that Time Spy benchmark, intentionally or not, by design, fits perfectly for the capabilites of Pascal

Can you explain in specific detail what the capabilities of Pascal are in this area, and exactly how this benchmark "fits them perfectly"?

Det0x · Jul 18, 2016

dogen1 said:
Can you explain in specific detail what the capabilities of Pascal are in this area, and exactly how this benchmark "fits them perfectly"?

You can read from page 48 in this thread:

http://www.overclock.net/t/1605674/computerbase-de-doom-vulkan-benchmarked/470

From what I understand based on Doothe's post, Time Spy is basically only just doing that new feature that Pascal has - it preempts some 3D work, quickly switches context to the compute work, then switches back to the 3D.

So it seems to me that Time Spy has a very minimal amount of async compute work compared to Doom and AotS, *and the manner in which it does its "async" is friendly to Pascal hardware. I don't think it's necessarily "optimized" for nvidia, as GCN seems to have no issue with context switching either. It's just not being allowed to take full advantage of GCN hardware.

* = read Pre-emption to suite the newest nv hardware, instead of truly asynchronous shaders

Compute queues as a % of total run time:

Doom: 43.70%
AOTS: 90.45%
Time Spy: 21.38%

It does look that way compared to AOTS, and DOOM. I don't have ROTR, Hitman, or any other DX12/Vulkan titles to test this theory against. In the two other games, GPUView shows two rectangles(compute queues) stacked on top of each other. Time Spy never needs to process more than one at a time.

Minimize the use of barriers and fences

We have seen redundant barriers and associated wait for idle operations as a major performance problem for DX11 to DX12 ports

The DX11 driver is doing a great job of reducing barriers – now under DX12 you need to do it

Any barrier or fence can limit parallelism

railven · Jul 18, 2016

Wow, this benchmark has pretty much started to show up in non-tech forums.

If it's catering to NV as alleged here, it's already doing it's damage to the AMD campaign of hardware focused Multi-Engine.

Interesting times ahead. I still haven't bought it. What was that about $5 version for license holders?

AMD should send some muscle over to Futremark and sort this out. Them losing in Async Compute is not going over so well.

railven · Jul 18, 2016

FM_Jarnis said:
So we had to start from somewhere. This will not be the "last 3dmark ever". FL12 is definitely interesting, but games are not yet using it, so it is more of a 2017 thing.

Pretty much how I feel. By the time we do get to robust DX12 games we'll be well into new GPU series by both companies.

Makes it easier to go Pascal and just wait for Navi to revisit AMD. Itching suspicion Vega is going to be another let down for the majority of games on the market at it's launch.

ThatBuzzkiller · Jul 18, 2016

railven said:
Wow, this benchmark has pretty much started to show up in non-tech forums.

If it's catering to NV as alleged here, it's already doing it's damage to the AMD campaign of hardware focused Multi-Engine.

Interesting times ahead. I still haven't bought it. What was that about $5 version for license holders?

AMD should send some muscle over to Futremark and sort this out. Them losing in Async Compute is not going over so well.

Negative, the matter will sort itself out once a new benchmark comes out from Futuremark that makes use of shader model 6.0 ...

The only things AMD should focus on are the games and their partnership with Microsoft along with their microarchitecture ...

railven · Jul 18, 2016

ThatBuzzkiller said:
Negative, the matter will sort itself out once a new benchmark comes out from Futuremark that makes use of shader model 6.0 ...

The only things AMD should focus on are the games and their partnership with Microsoft along with their microarchitecture ...

Hopefully those games come out plentiful and fast, because I get this feeling this benchmark is going to show up on GPU reviews soon enough.

Going to create a lot of forum fights when 480 AIBs loses to GTX 1060 AIBs in a async compute benchmark.

Silverforce11 · Jul 18, 2016

railven said:
Wow, this benchmark has pretty much started to show up in non-tech forums.

If it's catering to NV as alleged here, it's already doing it's damage to the AMD campaign of hardware focused Multi-Engine.

Interesting times ahead. I still haven't bought it. What was that about $5 version for license holders?

AMD should send some muscle over to Futremark and sort this out. Them losing in Async Compute is not going over so well.

They can't. It's explained already earlier. Time Spy is designed to target the lowest hanging fruit in terms of FL11 DX12 capabilities to ensure it runs well on all the GPUs out there. This means it can't go proper FL12 or heavy/real Async Compute.

NV also dominates PC gaming marketshare, you can't make a PC gaming benchmark and make the leader look total crap with a performance regression.

What this bench shows is that Pascal can gain with light compute workloads on the Async Queue, because it finally has preemption with fast context switching, it's able to fill in the idle shaders with this light compute workloads.

Ultimately, as some of you said, it doesn't matter how the performance is obtained, as long as it's got good performance.

Bacon1 · Jul 18, 2016

FM_Jarnis said:
You cannot make a fair benchmark if you start bolting on vendor-specific and even specific generation architecture centered optimizations.

DX12 is a standard. We made a benchmark according to the spec, up to the graphics card vendors how their products work implementing the spec (if they do not follow it, MS won't certify the drivers, so they do follow it).

Beyond that, we will be publishing an official clarification on this issue, probably later today or tomorrow. I fear it won't placate all the people who are going nuts over this with their claims, but we'll do our best.

Engine Considerations
Need IHV specific paths
● Use DX11 if you cant do this

http://www.gdcvault.com/play/1023128/Advanced-Graphics-Techniques-Tutorial-Day

Slide 4

Presentation by Nvidia and AMD.

Silverforce11 · Jul 18, 2016

railven said:
Hopefully those games come out plentiful and fast, because I get this feeling this benchmark is going to show up on GPU reviews soon enough.

Going to create a lot of forum fights when 480 AIBs loses to GTX 1060 AIBs in a async compute benchmark.

The next wave of big games on DX12 is due in a few month, there will not be a major shift in time for the 1060's launch review, it will mostly be DX11 tested.

railven · Jul 18, 2016

Silverforce11 said:
They can't. It's explained already earlier. Time Spy is designed to target the lowest hanging fruit in terms of FL11 DX12 capabilities to ensure it runs well on all the GPUs out there. This means it can't go proper FL12 or heavy/real Async Compute.

NV also dominates PC gaming marketshare, you can't make a PC gaming benchmark and make the leader look total crap with a performance regression.

What this bench shows is that Pascal can gain with light compute workloads on the Async Queue, because it finally has preemption with fast context switching, it's able to fill in the idle shaders with this light compute workloads.

Ultimately, as some of you said, it doesn't matter how the performance is obtained, as long as it's got good performance.

I know. I've been saying this since GCN/Mantle started to become the big talk here. I don't get why anyone is surprised Nvidia is on top, again. I don't particularly like Nvidia's business tactics, but they seem to have better support for more games on their hardware release. As an ex-AMD user, the lack of support in some titles I enjoy just got irritating.

It's just ironic seeing people argue AMD is going to target the mainstream, the more frugal buyers, and call it a a success but don't want to accept most game devs target the lowest cost option for their games. The logic around here baffles me sometimes.

Time Spy is basically NV's iron grip on the industry. By the time we do get proper DX12 games and benchmarks, NV will probably be riding AMD's coat tails cashing in while AMD continues to do all the work and maybe break even.

(Discussion) Futuremark 3DMark Time Spy Directx 12 Benchmark

Golden Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Golden Member

Senior member

Diamond Member

Golden Member

Senior member

Golden Member

Diamond Member

Golden Member

Senior member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member