(Discussion) Futuremark 3DMark Time Spy Directx 12 Benchmark

ThatBuzzkiller · Jul 18, 2016

railven said:
Hopefully those games come out plentiful and fast, because I get this feeling this benchmark is going to show up on GPU reviews soon enough.

Going to create a lot of forum fights when 480 AIBs loses to GTX 1060 AIBs in a async compute benchmark.

No worries, there's at least 6 DX12 games due until the end of this year ...

I don't expect AMD to win in Gears of War 4 since it's built on Unreal Engine 4 ...

railven · Jul 18, 2016

Silverforce11 said:
The next wave of big games on DX12 is due in a few month, there will not be a major shift in time for the 1060's launch review, it will mostly be DX11 tested.

ThatBuzzkiller said:
No worries, there's at least 6 DX12 games due until the end of this year ...

I don't expect AMD to win in Gears of War 4 since it's built on Unreal Engine 4 ...

6 games to the how many DX11 titles still in the pipeline?

Futuremark is right on the money with their prediction.

Silverforce11 · Jul 18, 2016

railven said:
6 games to the how many DX11 titles still in the pipeline?

Futuremark is right on the money with their prediction.

That's new DX12 on top of the current ones. Not 6 in total.

As benchmarks ditch old games and replace it with new ones, it only makes GCN look better, I'm sure you've noticed that at least.

What's going to be games that decides it for most gamers?

Battlefield 1, Deus Ex MD, Watch Dogs 2, Halo Wars 2, Gears of Wars 4, Forza Horizons...

Or:

Crysis 3, ACU, Project Cars, Metro (yeah, lots of sites still use this ancient game lol) etc.

The benchmark landscape will look very different 3 months from now.

ThatBuzzkiller · Jul 18, 2016

railven said:
6 games to the how many DX11 titles still in the pipeline?

Futuremark is right on the money with their prediction.

If we only counted AAA games there is about only 6 or so games left for the rest of this year that are exclusively DX11 ?

That's not half bad all things considered ...

It would be great if AMD featured DX12 for the upcoming Call of Duty too just to make the transition smoother ...

Silverforce11 · Jul 18, 2016

Bacon1 said:
http://www.gdcvault.com/play/1023128/Advanced-Graphics-Techniques-Tutorial-Day

Slide 4

Presentation by Nvidia and AMD.

This is spot on, DX12/Vulkan needs architecture specific paths to optimize it fully.

Using a single rendering path and hoping it runs the best on all hardware doesn't work for these next-gen APis.

As an example, Pascal gets a light Async Compute path so it can use it well with preemption. GCN gets a real parallel Multi-Engine path so it can flex it's power.

-----------------------------------

This was a good move from NVIDIA. Get into Time Spy early on and make sure it looks good on Pascal.

https://www.youtube.com/watch?v=kOsxV4-oRNA

^ From December. NV logo on Time Spy.

Red Hawk · Jul 19, 2016

Bacon1 said:
http://www.gdcvault.com/play/1023128/Advanced-Graphics-Techniques-Tutorial-Day

Slide 4

Presentation by Nvidia and AMD.

And that's the most damming thing to me actually, pretty much in exact contrast to what FM_Jarnis was saying. :\

AnandThenMan · Jul 19, 2016

Here's a direct link to the pdf may not work unless you first click the link Bacon posted. It's very easy to see that a one size fits all is basically impossible under DX12 you are going to have to compromise to the hardware with lesser capabilities.

Bacon1 · Jul 19, 2016

railven said:
6 games to the how many DX11 titles still in the pipeline?

Futuremark is right on the money with their prediction.

Lets see released games we have:

Forza Apex (DX12 only)
Quantum Break (DX12 only)
Gears of War UE (DX12 only)
Hitman
Rise of the Tomb Raider
Total War Warhammer
Ashes of the Singularity
Doom (Vulkan)

Upcoming AAA we have:

Gears of Wars 4 (DX12 only again?)
Forza Horizons (Dx12 only again?)
Battlefield 1
Deus Ex Mankind Divided
Halo Wars 2
Watch Dogs 2

Plus anything else from DICE / EA, Square Enix and others which have released either Mantle, Vulkan or DX12 games now since they've done the heavy lifting with the engine work.

FM_Jarnis · Jul 19, 2016

Red Hawk said:
And that's the most damming thing to me actually, pretty much in exact contrast to what FM_Jarnis was saying. :\

Making games and making fair and unbiased benchmarks are two different things.

We have actually discussed this very subject with the graphics vendors and they are against doing it in 3DMark. Such optimizations almost always inevitably require altering the actual work being performed, and then it would no longer be a common reference point.

With a game it doesn't really matter if you optimize by subtly altering what is being rendered according to the strengths of each architecture, if doing so gives substantial gains in framerate, but in an unbiased benchmark that isn't a good idea.

(and before you drag async back into this; every single card performs the exact same work with Time Spy. Different architectures and drivers choose how exactly they arrange the work and manage the resources of the hardware, but the command queues going to the driver and the final output rendered is identical)

Zstream · Jul 19, 2016

FM_Jarnis said:
Making games and making fair and unbiased benchmarks are two different things.

We have actually discussed this very subject with the graphics vendors and they are against doing it in 3DMark. Such optimizations almost always inevitably require altering the actual work being performed, and then it would no longer be a common reference point.

With a game it doesn't really matter if you optimize by subtly altering what is being rendered according to the strengths of each architecture, if doing so gives substantial gains in framerate, but in an unbiased benchmark that isn't a good idea.

(and before you drag async back into this; every single card performs the exact same work with Time Spy. Different architectures and drivers choose how exactly they arrange the work and manage the resources of the hardware, but the command queues going to the driver and the final output rendered is identical)

How about a car analogy:

You test two cars on a quarter mile track. That's unbiased but also does not show that one of the cars is terrible at going fast around corners, or bottoms out at a quarter mile.

Yes, it's unbiased but also shows a lack of the real world and what you will do on the road.

That's as best as I can say it. It's unfortunate that you won't test real world usage, like driving around corners or longer than a quarter mile.

We want you to do a FULL test and show the realities of each other's architecture, for good or bad.

selni · Jul 19, 2016

Red Hawk said:
And that's the most damming thing to me actually, pretty much in exact contrast to what FM_Jarnis was saying. :\

I mean that's true but it's also pretty damning of every DX12 implementation so far as well isn't it? Who's doing distinct render paths for each architecture?

FM_Jarnis · Jul 19, 2016

Zstream said:
How about a car analogy:

You test two cars on a quarter mile track. That's unbiased but also does not show that one of the cars is terrible at going fast around corners, or bottoms out at a quarter mile.

Yes, it's unbiased but also shows a lack of the real world and what you will do on the road.

That's as best as I can say it. It's unfortunate that you won't test real world usage, like driving around corners or longer than a quarter mile.

We want you to do a FULL test and show the realities of each other's architecture, for good or bad.

If you really want to go there (car analogies... uuuuh), what you are suggesting is that we should have two tracks.

One with lots of curves for the car that is damn good at curves and another with just a few long straights for the thing that can't turn to save it's life.

Both are showing their "best sides", but the track is not the same, so how is this a fair comparison?

(this is a pretty silly discussion...)

Both cars on same track that has all kinds of curves and straights, with both going through the exact same thing, no? Ie. like 3DMark does it. I mean, sure, you can argue all day about how exactly the track should then be laid out, and which bit is better for one car and which bit for the other, but Futuremark has been doing this for almost 20 years and we have AMD, NVIDIA and Intel participating in the "track design", so how could we make it more fair?

DeathReborn · Jul 19, 2016

FM_Jarnis said:
If you really want to go there (car analogies... uuuuh), what you are suggesting is that we should have two tracks.

One with lots of curves for the car that is damn good at curves and another with just a few long straights for the thing that can't turn to save it's life.

Both are showing their "best sides", but the track is not the same, so how is this a fair comparison?

(this is a pretty silly discussion...)

Both cars on same track that has all kinds of curves and straights, with both going through the exact same thing, no? Ie. like 3DMark does it. I mean, sure, you can argue all day about how exactly the track should then be laid out, and which bit is better for one car and which bit for the other, but Futuremark has been doing this for almost 20 years and we have AMD, NVIDIA and Intel participating in the "track design", so how could we make it more fair?

That's how I read it as well, fair workloads to compare different makes. I am fairly sure ALL parties were pushing their own brand of what DX12 is but FM has to make it fair to all participants.

In "car speak" you have in Formula 1 Ferrari, Red Bull & Mercedes who have large budgets (many others don't), work within the same rules (DX12 equivalent) yet all race on the same track, some work better on the straights, some in the fast corners, some in slow corners. Some even like the rain (okay, not really but you get the picture) but they don't have different routes to suit different cars.

I'm sure FM could make a "balls to the wall" benchmark that you can't really use to compare but you just know people will and the point of the benchmark is completely lost.

Thanks to FM_Jarnis for trying to explain.

Bacon1 · Jul 19, 2016

FM_Jarnis said:
If you really want to go there (car analogies... uuuuh), what you are suggesting is that we should have two tracks.

One with lots of curves for the car that is damn good at curves and another with just a few long straights for the thing that can't turn to save it's life.

Both are showing their "best sides", but the track is not the same, so how is this a fair comparison?

(this is a pretty silly discussion...)

Both cars on same track that has all kinds of curves and straights, with both going through the exact same thing, no? Ie. like 3DMark does it. I mean, sure, you can argue all day about how exactly the track should then be laid out, and which bit is better for one car and which bit for the other, but Futuremark has been doing this for almost 20 years and we have AMD, NVIDIA and Intel participating in the "track design", so how could we make it more fair?

So are you saying that there are parts of the test that show compute+graphics+copy all running at the same time with async compute?

Red Hawk · Jul 19, 2016

FM_Jarnis said:
Making games and making fair and unbiased benchmarks are two different things.

We have actually discussed this very subject with the graphics vendors and they are against doing it in 3DMark. Such optimizations almost always inevitably require altering the actual work being performed, and then it would no longer be a common reference point.

With a game it doesn't really matter if you optimize by subtly altering what is being rendered according to the strengths of each architecture, if doing so gives substantial gains in framerate, but in an unbiased benchmark that isn't a good idea.

(and before you drag async back into this; every single card performs the exact same work with Time Spy. Different architectures and drivers choose how exactly they arrange the work and manage the resources of the hardware, but the command queues going to the driver and the final output rendered is identical)

Ok...that does make more sense. I'm trying to give you the benefit of the doubt here. It does make sense that for a benchmark, you would give each graphics card architecture the same workload to render.

Bacon1 said:
Lets see released games we have:

Forza Apex (DX12 only)
Quantum Break (DX12 only)
Gears of War UE (DX12 only)
Hitman
Rise of the Tomb Raider
Total War Warhammer
Ashes of the Singularity
Doom (Vulkan)

Upcoming AAA we have:

Gears of Wars 4 (DX12 only again?)
Forza Horizons (Dx12 only again?)
Battlefield 1
Deus Ex Mankind Divided
Halo Wars 2
Watch Dogs 2

Plus anything else from DICE / EA, Square Enix and others which have released either Mantle, Vulkan or DX12 games now since they've done the heavy lifting with the engine work.

Don't forget Dota 2 and The Talos Principle, which have Vulkan renderers now.

FM_Jarnis · Jul 19, 2016

Bacon1 said:
So are you saying that there are parts of the test that show compute+graphics+copy all running at the same time with async compute?

Compute + graphics running simultaneously. That is what async compute is. Compute queue and Graphics (aka Direct) queue running at the same time. Happens throughout Demo and Graphics Test 1 & 2.

Copy can also run simultaneously, but Time Spy does not use Copy to ensure it is an isolating graphics card benchmark (Graphics tests) - all content is instead loaded to VRAM before the test starts. So as long as you meet the VRAM requirements, there is no traffic to main RAM (if you don't, shared RAM is used, with the usual performance penalty, normal story with iGPUs etc. and then RAM performance matters)

Det0x · Jul 19, 2016

FM_Jarnis said:
Compute + graphics running simultaneously. That is what async compute is. Compute queue and Graphics (aka Direct) queue running at the same time. Happens throughout Demo and Graphics Test 1 & 2.

Copy can also run simultaneously, but Time Spy does not use Copy to ensure it is an isolating graphics card benchmark (Graphics tests) - all content is instead loaded to VRAM before the test starts. So as long as you meet the VRAM requirements, there is no traffic to main RAM (if you don't, shared RAM is used, with the usual performance penalty, normal story with iGPUs etc. and then RAM performance matters)

Dont seem to be the general consensus looking at this screenshot ? :hmm:

Just to quote some of the responses:

That would be a major dilemma. Last thing benchmark program should do is create separate optimized path for each GPU.

This is exactly the dilemma I expected from them. There's no way to have a single render path in a DX12 benchmark without optimizing it for the lowest common denominator and punishing the silicon with extra features.

"Impartial" benchmarking has become an oxymoron with DX12. You have to optimize for each vendor or you're unfairly punishing one of them. It just about makes the whole concept of "benchmark" meaningless.

They had no problem doing this with tessellation. Now suddenly they've got morals?

I'd say there's a difference between doing the same workload (serially vs in parallel) and actively reducing the amount of workload with tessellation (geometry) is there not? Or am I not understanding this correctly?

I get you, but DX12 is not a one-size-fits-all API. Arguably DX11 was, but AMD suffered with high tess and had driver optimizations to keep such punishment within architectural limits. These driver optimizations became invalid within 3dmark, so they were left competing one-for-one with Nvidia.

OK 3dmark, that's fine if you want to look neutral, but now with DX12 AMD isn't allowed to shine with its parallel hardware-- it must remain on a level playing field with an NV-optimized render path. It's not an indication of game performance, unless that game is specifically NV-optimized and has very few if any AMD async shader optimizations.

See the theme here? The last 3dmark was NV-optimized with tessellation levels. The limitation was on the AMD side, and the fix was ignored / bypassed. This 3dmark is NV-optimized in its avoidance of Async Compute + Graphics, aka Async Shaders. The limitation is on the Nvidia side, and the fix is honored.

It's a valid benchmark as long as AMD knows its place.

With the given evidence, we can say that Time Spy benchmark, intentionally or not, by design, fits perfectly for the capabilites of Pascal, other Nvidia architectures are not capable of async computing at all, and most of the AMD architectures in theory are left with spare room to be requested of much heavier async computing loads.

It's like Tessellation loads were designed to fit the inferior AMD capabilities back in the day. There is a clear pattern with Futuremark controversies regardless of who's on the right or wrong, and it's that they always favor Nvidia.

From what I understand based on Doothe's post, Time Spy is basically only just doing that new feature that Pascal has - it preempts some 3D work, quickly switches context to the compute work, then switches back to the 3D.

So it seems to me that Time Spy has a very minimal amount of async compute work compared to Doom and AotS, *and the manner in which it does its "async" is friendly to Pascal hardware. I don't think it's necessarily "optimized" for nvidia, as GCN seems to have no issue with context switching either. It's just not being allowed to take full advantage of GCN hardware.

* = read Pre-emption to suite the newest nv hardware, instead of truly asynchronous shaders

Compute queues as a % of total run time:

Doom: 43.70%
AOTS: 90.45%
Time Spy: 21.38%

It does look that way compared to AOTS, and DOOM. I don't have ROTR, Hitman, or any other DX12/Vulkan titles to test this theory against. In the two other games, GPUView shows two rectangles(compute queues) stacked on top of each other. Time Spy never needs to process more than one at a time.

Minimize the use of barriers and fences

We have seen redundant barriers and associated wait for idle operations as a major performance problem for DX11 to DX12 ports

The DX11 driver is doing a great job of reducing barriers – now under DX12 you need to do it

Any barrier or fence can limit parallelism

If we are misinterpreting the data, please feel free to correct us.

And as you said in the other thread:

FM_Jarnis said:
The goal is to have a benchmark that gives an accurate indication how DX12 games, on average, perform on various graphics cards. To help people make educated purchasing decisions and to serve as an unbiased, neutral "yardstick" on gaming performance of various systems.

There are plenty of games out there that have various colored teams doing super special optimizations that favor one or the other architecture that may or may not show "what hardware is truly capable of". With 3DMark you know that it gives you the real deal *without* those bits that may influence hardware comparisons considerably. This also means it stays valid when a new generation of hardware arrives while those super-special-optimized games may suddenly perform much worse on the latest hardware when their optimizations no longer fit the new architecture.

"Educated purchasing decisions".. Thanks Time Spy, based on your guidance i have come to the conclusion (based on the benchmark scores) that the 970 will essentially be equal to the 300 series cards in DX12/Vulkan titles.

If it turns out that Maxwell cards end up getting slaughtered in the future when real DX12 titles drop, I'm sure 3DMark will accept accountability?

sontin · Jul 19, 2016

Det0x said:
If we are misinterpreting the data, please feel free to correct us.

Sure you are. I mean you dont even know what "preemption" is, yet you are convinced that Futuremark is using it. :awe:

Definition of "preemption":

Preemption (computing), the interruption of a computer process without its cooperation in order to perform another task

https://en.wikipedia.org/wiki/Preemption

Your GPUView screen doesnt show preemption.

dogen1 · Jul 19, 2016

Bacon1 said:
So are you saying that there are parts of the test that show compute+graphics+copy all running at the same time with async compute?

Yes. The guide explicitly says they schedule asynchronous compute shaders(doing a variety of things) to run in parallel with shadow map rendering.

I've already copy pasted parts of the guide before, but you should read it yourself. They even have some nice diagrams how their queues are set up and whatnot.

sontin · Jul 19, 2016

This section wasnt in the technical guide the last few days. :thumbsup:

FM_Jarnis · Jul 19, 2016

Yes, the technical guide already has an updated section on this (which is a good chunk of what our post on this will be). Full post will go up on Futuremark.com within the next two hours (they putting it up to the publishing system now)

http://www.futuremark.com/downloads/3DMark_Technical_Guide.pdf

(Page 27 and onward)

(And if you want to object to something or other, I recommend waiting for the post on futuremark.com, it will have more detail - this is just the "tech" bit of it)

William Gaatjes · Jul 19, 2016

I am just a noob, but i do not get it. DX12 and vulkan gives a developer the chance to access the hardware at a lower level. Being less abstract when compared to opengl and the older dx versions. So to me it is inevitable, that there will be different ways to optimize for to get the best results on different hardware. To create a general way of using hardware (In a sense the developer created an abstraction level ) will either choose one architecture over the other or cripple (reduce performance) both.

I find it strange. :hmm:

There will always have to be architecture specific optimizations. Yes ?

dogen1 · Jul 19, 2016

sontin said:
This section wasnt in the technical guide the last few days. :thumbsup:

Description of async compute shader tasks and that they're run in parallel with shadow maps(provided the hardware and driver will do that) was in there from the beginning.

sontin · Jul 19, 2016

Yes, but not the technical description how Futuremark is doing it.

FM_Jarnis · Jul 19, 2016

Our clarification on how Time Spy works:

http://www.futuremark.com/pressreleases/a-closer-look-at-asynchronous-compute-in-3dmark-time-spy

(Discussion) Futuremark 3DMark Time Spy Directx 12 Benchmark

Golden Member

Diamond Member

Lifer

Golden Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Senior member

Member

Platinum Member

Diamond Member

Diamond Member

Member

Golden Member

Diamond Member

Senior member

Diamond Member

Member

Lifer

Senior member

Diamond Member

Member