(Discussion) Futuremark 3DMark Time Spy Directx 12 Benchmark

Page 7 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Ok, then we'll make it straight forward, out of the 52% gain from openGL to Vulkan that computebase.de showed with the FuryX, roughly how much of that gain is from async, how much from instrinsic shaders, then how much from multi-core rendering, etc?

Oh and most important, how much from Vulkan vs OpenGL (in particular from not using the OpenGL driver from AMD, which seems to be not particularly optimized).

Check this comparison:
https://www.youtube.com/watch?v=P_I8an8jXuM

Apparently there is much more to OpenGL vs Vulkan/DX12 which is impacting performance than just "intrinsic shaders".

I'd say it is impossible to break down sources of performance gains and anyone claims the contrary is just spreading FUD. (Aside of course ID software, which most likely know the break-down more precisely)
 
Last edited:

dogen1

Senior member
Oct 14, 2014
739
40
91
Ok, then we'll make it straight forward, out of the 52% gain from openGL to Vulkan that computebase.de showed with the FuryX, roughly how much of that gain is from async, how much from instrinsic shaders, then how much from multi-core rendering, etc?

IIRC, ~10% from async compute. Maybe more for the Fury and Fury X.
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
Lots of in depth discussion here: http://www.overclock.net/t/1605674/computerbase-de-doom-vulkan-benchmarked/500

Looks like people have confirmed with GPUView that the "Async Compute" is nothing more than Pre-Emption based and tailored to NV hardware not AMD. Coupled with the remarks from FM saying they don't have different paths even though DX12 needs specific ones because drivers aren't as important, means it is a pretty heavily NV biased test :(. Hopefully we hear some more info or they add in separate paths, otherwise Ashes is still better benchmark of true DX12 engines.
 

Riek

Senior member
Dec 16, 2008
409
14
76
IIRC, ~10% from async compute. Maybe more for the Fury and Fury X.

how do you come by these numbers?

If you go by the difference between smaa and tssaa you should also take the initial difference of those two into account.

on opengl TSAAA is almost 10% slower than SMAA.

In vulkan it is almost 10% faster than SMAA.

So the difference is > 15%
 

Hitman928

Diamond Member
Apr 15, 2012
5,182
7,632
136
Lots of in depth discussion here: http://www.overclock.net/t/1605674/computerbase-de-doom-vulkan-benchmarked/500

Looks like people have confirmed with GPUView that the "Async Compute" is nothing more than Pre-Emption based and tailored to NV hardware not AMD. Coupled with the remarks from FM saying they don't have different paths even though DX12 needs specific ones because drivers aren't as important, means it is a pretty heavily NV biased test :(. Hopefully we hear some more info or they add in separate paths, otherwise Ashes is still better benchmark of true DX12 engines.

I don't think it's NV biased at all. AMD still benefits from what they're doing and actually benefits more than NV. Again, every benchmark is useless if you don't know what you're testing. This benchmark is fine (IMO) but you need to understand what it's showing you rather than just X score is bigger than Y score.

I am hoping that FM is working on a test that actually uses graphics+compute parallel work as that is already being used in next gen games but I understand why they limited their approach with this test and think it's fine. Although, I will state again, I do think they need to be a little more transparent in their support docs.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
Lots of in depth discussion here:

Looks like people have confirmed with GPUView that the "Async Compute" is nothing more than Pre-Emption based and tailored to NV hardware not AMD. Coupled with the remarks from FM saying they don't have different paths even though DX12 needs specific ones because drivers aren't as important, means it is a pretty heavily NV biased test :(. Hopefully we hear some more info or they add in separate paths, otherwise Ashes is still better benchmark of true DX12 engines.

All I see is a lot of noise and not much information.

Can you point out such confirmation?


how do you come by these numbers?

If you go by the difference between smaa and tssaa you should also take the initial difference of those two into account.

on opengl TSAAA is almost 10% slower than SMAA.

In vulkan it is almost 10% faster than SMAA.

So the difference is > 15%

I thought TSSAA was more or less equivalent the TSSAA in cost.

Btw, I said IIRC, not I tested such and such and came up with these numbers.
 
Last edited:

AnandThenMan

Diamond Member
Nov 11, 2004
3,949
504
126
I don't think it's NV biased at all. AMD still benefits from what they're doing and actually benefits more than NV.
This does not in any way prove that Time Spy is unbiased. Neither does it prove it is biased but what does suggest this is how the bench is approached async compute.

This has been posted before but is an excellent explanation of what is actually going on.
https://i.imgur.com/W01dMG6.png
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
on opengl TSAAA is almost 10% slower than SMAA.

In vulkan it is almost 10% faster than SMAA.

So the difference is > 15%

Wouldn't this make the difference >20%? Example:

SMAA 100%->TSSAA 90% -> async TSSAA 110%. Overall aync gain 110/90-1 = 22%
 

trinibwoy

Senior member
Apr 29, 2005
317
3
81
I agree with you. The problem I and from what I understand others have is what exactly is the objective of TimeSpy benchmark and it's AC implementation. If the objective is to test hardware ability to execute graphics and compute queues concurrently and in parallel fashion, then it obviously failed in that mission. So what is the point of TimeSpy AC then if one vendor serializes those queues which is basically the same as turning it off in custom settings.

If that was not the objective then what was? That's pretty much all I guess.

All benchmarks measure performance and the validity of the result. They submit work via standard APIs and expect a correct result. They do not measure or dictate "how" that result should be achieved.

Also, folks are arguing that async isn't valid unless tasks are run concurrently. That is completely false. Async is a logical separation of work. If AMD can run tasks in parallel their performance will go up. If nVidia can't then their performance would not.

It's that simple. Everything else is either due to people having a very poor understanding of the topic or intentional misinformation.
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
I don't think it's NV biased at all. AMD still benefits from what they're doing and actually benefits more than NV. Again, every benchmark is useless if you don't know what you're testing. This benchmark is fine (IMO) but you need to understand what it's showing you rather than just X score is bigger than Y score.

I am hoping that FM is working on a test that actually uses graphics+compute parallel work as that is already being used in next gen games but I understand why they limited their approach with this test and think it's fine. Although, I will state again, I do think they need to be a little more transparent in their support docs.

No because it makes Polaris / GCN look like they perform similar to Pascal, when GCN has much better "true" async compute potential.

All I see is a lot of noise and not much information.

Can you point out such confirmation?

http://www.overclock.net/t/1605674/computerbase-de-doom-vulkan-benchmarked/470#post_25357883

http://www.overclock.net/t/1605674/computerbase-de-doom-vulkan-benchmarked/490#post_25358182

are a few, worth reading the whole thing.

As I pointed out in the PDF from Nvidia's Developer portal, they recommend having two different coding paths because you can't optimize per architecture which is what DX12 allows you to do. Time Spy doesn't do that.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
No because it makes Polaris / GCN look like they perform similar to Pascal, when GCN has much better "true" async compute potential.

Sure the benchmark does not reflect that as clearly as it could. On the other hand, even when taking Time Spy into consideration, it still shows that GCN gains about double the amount from async compute compared with Pascal. In fact Pascal gains are in low single digit range. In summary even if you take Time Spy as reference, it does not suddenly turn Pascal into a good DX12 performer.
 

Riek

Senior member
Dec 16, 2008
409
14
76
Wouldn't this make the difference >20%? Example:

SMAA 100%->TSSAA 90% -> async TSSAA 110%. Overall aync gain 110/90-1 = 22%

yes but the 10ù was rounded upwards (almost 10%). Its actually more around 8% and to give some error margin as well I rather say > 15% then to assume a much better ideal situation.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
yes but the 10ù was rounded upwards (almost 10%). Its actually more around 8% and to give some error margin as well I rather say > 15% then to assume a much better ideal situation.

Even calculating with 8% would make the performance gain very close to 20% (19.6%).

In addition, from what i see the 10% gain SMAA no async vs TSSAA async is already generously rounded down.
 

SPBHM

Diamond Member
Sep 12, 2012
5,056
409
126
Didn't Futuremark do exactly this in the past ? "Bolting on" PhysX into the benchmark ?

If you were using a Nvidia card, you have to option to choose if you wanted PhysX to run on the CPU or on a NV graphic card ?

Something which boosted scores from a specific vendor..

While if you were using a AMD card, your only option was to run it on the CPU. (or use the amd+nvidia PhysX hack)

the whole thing happened mostly before PhysX was owned by Nvidia (Futuremark were working with AGEIA before it was acquired by Nvidia), lots of games were using PhysX at that point (mostly on the CPU only, with a few using accelerated PhysX via the AGEIA PPU cards), with later updates I also think Vantage changed from having PPU enabled by default to off, so the default settings for it were in the end CPU only for any GPU
 

Det0x

Golden Member
Sep 11, 2014
1,027
2,953
136
the whole thing happened mostly before PhysX was owned by Nvidia (Futuremark were working with AGEIA before it was acquired by Nvidia), lots of games were using PhysX at that point (mostly on the CPU only, with a few using accelerated PhysX via the AGEIA PPU cards), with later updates I also think Vantage changed from having PPU enabled by default to off, so the default settings for it were in the end CPU only for any GPU

I will copy some posts from a other forum :whiste:

That would be a major dilemma. Last thing benchmark program should do is create separate optimized path for each GPU.

This is exactly the dilemma I expected from them. There's no way to have a single render path in a DX12 benchmark without optimizing it for the lowest common denominator and punishing the silicon with extra features.

"Impartial" benchmarking has become an oxymoron with DX12. You have to optimize for each vendor or you're unfairly punishing one of them. It just about makes the whole concept of "benchmark" meaningless.

They had no problem doing this with tessellation. Now suddenly they've got morals?

I'd say there's a difference between doing the same workload (serially vs in parallel) and actively reducing the amount of workload with tessellation (geometry) is there not? Or am I not understanding this correctly?

I get you, but DX12 is not a one-size-fits-all API. Arguably DX11 was, but AMD suffered with high tess and had driver optimizations to keep such punishment within architectural limits. These driver optimizations became invalid within 3dmark, so they were left competing one-for-one with Nvidia.

OK 3dmark, that's fine if you want to look neutral, but now with DX12 AMD isn't allowed to shine with its parallel hardware-- it must remain on a level playing field with an NV-optimized render path. It's not an indication of game performance, unless that game is specifically NV-optimized and has very few if any AMD async shader optimizations.

See the theme here? The last 3dmark was NV-optimized with tessellation levels. The limitation was on the AMD side, and the fix was ignored / bypassed. This 3dmark is NV-optimized in its avoidance of Async Compute + Graphics, aka Async Shaders. The limitation is on the Nvidia side, and the fix is honored.

It's a valid benchmark as long as AMD knows its place.

This sums up my views pretty much :)

*edit*

s51q4IX.jpg


With the given evidence, we can say that Time Spy benchmark, intentionally or not, by design, fits perfectly for the capabilites of Pascal, other Nvidia architectures are not capable of async computing at all, and most of the AMD architectures in theory are left with spare room to be requested of much heavier async computing loads.

It's like Tessellation loads were designed to fit the inferior AMD capabilities back in the day. There is a clear pattern with Futuremark controversies regardless of who's on the right or wrong, and it's that they always favor Nvidia.

btw 3DMark time spy was demonstrated for the first time at the GOC Asia Nvidia Event.

https://www.youtube.com/watch?v=kOsxV4-oRNA
 
Last edited:

dogen1

Senior member
Oct 14, 2014
739
40
91
With the given evidence, we can say that Time Spy benchmark, intentionally or not, by design, fits perfectly for the capabilites of Pascal

Can you explain in specific detail what the capabilities of Pascal are in this area, and exactly how this benchmark "fits them perfectly"?
 
Last edited:

Det0x

Golden Member
Sep 11, 2014
1,027
2,953
136
Can you explain in specific detail what the capabilities of Pascal are in this area, and exactly how this benchmark "fits them perfectly"?

You can read from page 48 in this thread:

http://www.overclock.net/t/1605674/computerbase-de-doom-vulkan-benchmarked/470

From what I understand based on Doothe's post, Time Spy is basically only just doing that new feature that Pascal has - it preempts some 3D work, quickly switches context to the compute work, then switches back to the 3D.

So it seems to me that Time Spy has a very minimal amount of async compute work compared to Doom and AotS, *and the manner in which it does its "async" is friendly to Pascal hardware. I don't think it's necessarily "optimized" for nvidia, as GCN seems to have no issue with context switching either. It's just not being allowed to take full advantage of GCN hardware.

* = read Pre-emption to suite the newest nv hardware, instead of truly asynchronous shaders

Compute queues as a % of total run time:

Doom: 43.70%
AOTS: 90.45%
Time Spy: 21.38%

It does look that way compared to AOTS, and DOOM. I don't have ROTR, Hitman, or any other DX12/Vulkan titles to test this theory against. In the two other games, GPUView shows two rectangles(compute queues) stacked on top of each other. Time Spy never needs to process more than one at a time.

  • Minimize the use of barriers and fences
  • We have seen redundant barriers and associated wait for idle operations as a major performance problem for DX11 to DX12 ports
  • The DX11 driver is doing a great job of reducing barriers – now under DX12 you need to do it
  • Any barrier or fence can limit parallelism

2831905
 
Last edited:

railven

Diamond Member
Mar 25, 2010
6,604
561
126
Wow, this benchmark has pretty much started to show up in non-tech forums.

If it's catering to NV as alleged here, it's already doing it's damage to the AMD campaign of hardware focused Multi-Engine.

Interesting times ahead. I still haven't bought it. What was that about $5 version for license holders?

AMD should send some muscle over to Futremark and sort this out. Them losing in Async Compute is not going over so well.
 

railven

Diamond Member
Mar 25, 2010
6,604
561
126
So we had to start from somewhere. This will not be the "last 3dmark ever". FL12 is definitely interesting, but games are not yet using it, so it is more of a 2017 thing.

Pretty much how I feel. By the time we do get to robust DX12 games we'll be well into new GPU series by both companies.

Makes it easier to go Pascal and just wait for Navi to revisit AMD. Itching suspicion Vega is going to be another let down for the majority of games on the market at it's launch.
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
Wow, this benchmark has pretty much started to show up in non-tech forums.

If it's catering to NV as alleged here, it's already doing it's damage to the AMD campaign of hardware focused Multi-Engine.

Interesting times ahead. I still haven't bought it. What was that about $5 version for license holders?

AMD should send some muscle over to Futremark and sort this out. Them losing in Async Compute is not going over so well.

Negative, the matter will sort itself out once a new benchmark comes out from Futuremark that makes use of shader model 6.0 ...

The only things AMD should focus on are the games and their partnership with Microsoft along with their microarchitecture ...
 

railven

Diamond Member
Mar 25, 2010
6,604
561
126
Negative, the matter will sort itself out once a new benchmark comes out from Futuremark that makes use of shader model 6.0 ...

The only things AMD should focus on are the games and their partnership with Microsoft along with their microarchitecture ...

Hopefully those games come out plentiful and fast, because I get this feeling this benchmark is going to show up on GPU reviews soon enough.

Going to create a lot of forum fights when 480 AIBs loses to GTX 1060 AIBs in a async compute benchmark.
 
Feb 19, 2009
10,457
10
76
Wow, this benchmark has pretty much started to show up in non-tech forums.

If it's catering to NV as alleged here, it's already doing it's damage to the AMD campaign of hardware focused Multi-Engine.

Interesting times ahead. I still haven't bought it. What was that about $5 version for license holders?

AMD should send some muscle over to Futremark and sort this out. Them losing in Async Compute is not going over so well.

They can't. It's explained already earlier. Time Spy is designed to target the lowest hanging fruit in terms of FL11 DX12 capabilities to ensure it runs well on all the GPUs out there. This means it can't go proper FL12 or heavy/real Async Compute.

NV also dominates PC gaming marketshare, you can't make a PC gaming benchmark and make the leader look total crap with a performance regression.

What this bench shows is that Pascal can gain with light compute workloads on the Async Queue, because it finally has preemption with fast context switching, it's able to fill in the idle shaders with this light compute workloads.

Ultimately, as some of you said, it doesn't matter how the performance is obtained, as long as it's got good performance.
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
You cannot make a fair benchmark if you start bolting on vendor-specific and even specific generation architecture centered optimizations.

DX12 is a standard. We made a benchmark according to the spec, up to the graphics card vendors how their products work implementing the spec (if they do not follow it, MS won't certify the drivers, so they do follow it).

Beyond that, we will be publishing an official clarification on this issue, probably later today or tomorrow. I fear it won't placate all the people who are going nuts over this with their claims, but we'll do our best.

Engine Considerations
Need IHV specific paths
● Use DX11 if you can’t do this

http://www.gdcvault.com/play/1023128/Advanced-Graphics-Techniques-Tutorial-Day

Slide 4

Presentation by Nvidia and AMD.
 
Feb 19, 2009
10,457
10
76
Hopefully those games come out plentiful and fast, because I get this feeling this benchmark is going to show up on GPU reviews soon enough.

Going to create a lot of forum fights when 480 AIBs loses to GTX 1060 AIBs in a async compute benchmark.

The next wave of big games on DX12 is due in a few month, there will not be a major shift in time for the 1060's launch review, it will mostly be DX11 tested.
 

railven

Diamond Member
Mar 25, 2010
6,604
561
126
They can't. It's explained already earlier. Time Spy is designed to target the lowest hanging fruit in terms of FL11 DX12 capabilities to ensure it runs well on all the GPUs out there. This means it can't go proper FL12 or heavy/real Async Compute.

NV also dominates PC gaming marketshare, you can't make a PC gaming benchmark and make the leader look total crap with a performance regression.

What this bench shows is that Pascal can gain with light compute workloads on the Async Queue, because it finally has preemption with fast context switching, it's able to fill in the idle shaders with this light compute workloads.

Ultimately, as some of you said, it doesn't matter how the performance is obtained, as long as it's got good performance.

I know. I've been saying this since GCN/Mantle started to become the big talk here. I don't get why anyone is surprised Nvidia is on top, again. I don't particularly like Nvidia's business tactics, but they seem to have better support for more games on their hardware release. As an ex-AMD user, the lack of support in some titles I enjoy just got irritating.

It's just ironic seeing people argue AMD is going to target the mainstream, the more frugal buyers, and call it a a success but don't want to accept most game devs target the lowest cost option for their games. The logic around here baffles me sometimes.

Time Spy is basically NV's iron grip on the industry. By the time we do get proper DX12 games and benchmarks, NV will probably be riding AMD's coat tails cashing in while AMD continues to do all the work and maybe break even.