DirectX 12Futuremark 3DMark Time Spy Benchmarks Thread

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

dogen1

Senior member
Oct 14, 2014
739
40
91
That's a terrible technical guide, it does not even go into the technical aspects.

They don't specify further, just that they use Async Compute to increase GPU utilization.

id Software uses Async Compute to both increase shader utilization with post effects, and to actually run Rasterizers & DMAs in parallel with Shaders via Shadow Maps & Megatexture streaming.

"Before the main illumination passes, asynchronous compute shaders are used to cull lights, evaluate illumination from prebaked environment reflections,
compute screen-space ambient occlusion, and calculate unshadowed surface illumination. These tasks are started right after G-buffer rendering has finished and are executed alongside shadow rendering."

"Particles are simulated on the GPU using asynchronous compute queue. Simulation work is submitted to the asynchronous queue while G-buffer and shadow map rendering commands are submitted to the main command
queue."
 

moonbogg

Lifer
Jan 8, 2011
10,731
3,440
136
"Before the main illumination passes, asynchronous compute shaders are used to cull lights, evaluate illumination from prebaked environment reflections,
compute screen-space ambient occlusion, and calculate unshadowed surface illumination. These tasks are started right after G-buffer rendering has finished and are executed alongside shadow rendering."

"Particles are simulated on the GPU using asynchronous compute queue. Simulation work is submitted to the asynchronous queue while G-buffer and shadow map rendering commands are submitted to the main command
queue."

That's exactly what I assumed was happening. What about you guys?
 

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,250
136
"Before the main illumination passes, asynchronous compute shaders are used to cull lights, evaluate illumination from prebaked environment reflections,
compute screen-space ambient occlusion, and calculate unshadowed surface illumination. These tasks are started right after G-buffer rendering has finished and are executed alongside shadow rendering."

"Particles are simulated on the GPU using asynchronous compute queue. Simulation work is submitted to the asynchronous queue while G-buffer and shadow map rendering commands are submitted to the main command
queue."

Sorry....You lost me at before. :)

Sounds reasonable never the less.
 

Red Hawk

Diamond Member
Jan 1, 2011
3,266
169
106
Ran the test using the 16.7.2 driver package.

Results on default settings:

Graphics: 3910
CPU: 2804

Results on default settings with asynchronous compute disabled:

Graphics: 3382
CPU: 2801

Yay for asynchronous compute! Getting better performance for free. Doesn't seem to make a difference in the CPU test -- however, it could still be affecting CPU performance in a way. The monitoring graphs showed my CPU frequency fluctuating much more in the graphics tests on the non-async run than with async compute. Though I should note there was quite a bit of CPU frequency fluctuation with async compute anyways, especially in graphics test 1. It just fluctuated even more without async compute. Interestingly, both tests recorded near constant 100% GPU usage, so whatever goes unused without async compute doesn't appear to be picked up by GPU usage monitoring.

The test really made the three fans on my 290X work for their lunch. Woof. Peak GPU temperature was 76 degrees C by the end of the second graphics test, peak CPU temperature was 59 during the GPU test, 82 degrees by the end of the CPU test.
 
Last edited:

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,778
3,093
146
Nice score! Especially on that cpu:thumbsup:

Well, I literally spent countless hours dialing in the overclock to make it as fast as possible in every aspect. I'm really surprised I scored 17% higher per core than AdamK47 at only 9% faster clocks.
 

Red Hawk

Diamond Member
Jan 1, 2011
3,266
169
106
Hmm...tried running Time Spy on my brother's system, a Core 2 Quad Q6600 and Radeon 260X. But it won't run at all, just goes straight to the results screen with the error message "No results produced". Yeah, no duh. It's on the 16.7.2 drivers, I tried running in windowed mode and uninstalling and reinstalling the application. Still won't run, but Fire Strike runs fine. So I'm not sure what the problem could be.
 
Feb 19, 2009
10,457
10
76
"Before the main illumination passes, asynchronous compute shaders are used to cull lights, evaluate illumination from prebaked environment reflections,
compute screen-space ambient occlusion, and calculate unshadowed surface illumination. These tasks are started right after G-buffer rendering has finished and are executed alongside shadow rendering."

"Particles are simulated on the GPU using asynchronous compute queue. Simulation work is submitted to the asynchronous queue while G-buffer and shadow map rendering commands are submitted to the main command
queue."

Is this in the technical guide? I didn't see it under Time Spy section.
 

JustMe21

Senior member
Sep 8, 2011
324
49
91
Just for grins, I ran it on an i7-3770 and Radeon R7 250X aka Radeon 7770 and I got a whopping 596.
 

Red Hawk

Diamond Member
Jan 1, 2011
3,266
169
106
Tested at 1080p with 16x anisotropic filtering as that's my monitor resolution and I prefer to bump up the AF in games. No idea why trilinear filtering is the default for Time Spy, AF is pretty basic and doesn't have a high performance cost.

Async Off:

Graphics Score: 4,844
Test 1: 33.17 FPS
Test 2: 26.65 FPS

Async On:

Graphics Score: 5,524
Test 1: 38.37 FPS
Test 2: 30.05 FPS
 

YBS1

Golden Member
May 14, 2000
1,945
129
106
Jeez...I did something stupid. I loaded up my benchmark settings and ran it at 4.8GHz a couple of times and wondered why my scores had dropped to around ~10000, finally realized I had G-Sync on and it was set to 60Hz. I'll get back around to a 4.8GHz bench later, for now here is my daily 4.5Ghz.

14051.JPG


http://www.3dmark.com/spy/50984
 

Hitman928

Diamond Member
Apr 15, 2012
6,637
12,217
136
Sure, here you go:

With async:
Graphics score
2,725
Graphics test 1
17.59 FPS
Graphics test 2
15.76 FPS

Without async:
Graphics score
2,633
Graphics test 1
16.97 FPS
Graphics test 2
15.25 FPS

Thanks, looks like the benefit of async on the 1070 drops to about 3.5% at 4k. I'll do the same on my 290 tomorrow to compare.
 

Hitman928

Diamond Member
Apr 15, 2012
6,637
12,217
136
Also, I wanted to try the explicit multi adapter functionality so I ran it with a 290 and a 280 with disappointing results, it was no faster than a single 280 even though the 290 was being loaded. The 280 was the primary card so I'll switch them to see if can get hopefully better results.
 

Red Hawk

Diamond Member
Jan 1, 2011
3,266
169
106
Just for grins, I ran it on an i7-3770 and Radeon R7 250X aka Radeon 7770 and I got a whopping 596.

That's interesting. I just ran the benchmark on my own PC with my brother's 260X swapped out for my 290X. Scored 1,428 total, 1,312 graphics. It's a total slideshow either way, but the 260X (Bonaire) having twice the geometry hardware as the 250X (Cape Verde) probably helped a bunch.

Async compute tests were interesting. I'd heard that the weaker/smaller the chip gets, the less of a difference asynchronous compute makes, because the whole chip is likely being used at any given moment and there's really no spare resources. It could even cause a loss in performance, like with Maxwell chips in Ashes of the Singularity, because async compute introduces latency if it goes unused. I tested at 1440p, 1080p, and 1440x900, with and without async compute. Graphics results were:

1440p, async off: 1,313
1080p, async on: 2,080
1080p, async off: 2,092
1440x900, async on: 2895
1440x900, async off:2,905

So yeah, results pretty consistent with what I heard. The 260X actually gets a few extra points without asynchronous compute. The benefit of asynchronous compute is definitely for high-end cards with compute units to spare, not low end cards which are being used to their max as it is.

Also, I wanted to try the explicit multi adapter functionality so I ran it with a 290 and a 280 with disappointing results, it was no faster than a single 280 even though the 290 was being loaded. The 280 was the primary card so I'll switch them to see if can get hopefully better results.

...I didn't realize you could try that. I have my 290X, my brother's 260X, and a spare 270X I could try multiadapter with...though I doubt my PSU would appreciate both the 290X and 270X being hooked up to it. The 270X and 260X could be doable though...
 

richaron

Golden Member
Mar 27, 2012
1,357
329
136
Also, I wanted to try the explicit multi adapter functionality so I ran it with a 290 and a 280 with disappointing results, it was no faster than a single 280 even though the 290 was being loaded. The 280 was the primary card so I'll switch them to see if can get hopefully better results.

Pretty sure I read it's only for matching GPUs.
 

Hitman928

Diamond Member
Apr 15, 2012
6,637
12,217
136
Pretty sure I read it's only for matching GPUs.

Reading through it again, I think you might be right but I'm still not sure what the limits are. In the tech guide and in the slides they talk about being able to combine discrete and integrated solutions and how through explicit multi-adapter they can have control over any GPUs in the system. But then they drop this at the very end which I hadn't read before

MDA configurations of heterogeneous adapters are not supported

Edit: Nevermind, found the answer. They do say they only support AFR with identical GPUs. It's funny that they have slides and a whole paragraph in the tech doc about how dx12 allows them to harness a heterogeneous gpu system, then pull back and are like, but we're not doing that. Fun while it lasted, lol. I am sure the 290 was being loaded though, heard the fans spin up and the temp increased throughout the benchmark. I wonder if it was running but the results just thrown away or something.
 
Last edited:

Deders

Platinum Member
Oct 14, 2012
2,401
1
91
You can check with gpuz. Open 2 instances and make sure 1 is set to the secondary card.
 

Red Hawk

Diamond Member
Jan 1, 2011
3,266
169
106
Ok, got Time Spy working on my brother's PC, had to revert to driver 16.3.2 to do it, possibly something to do with AMD using SSE4, which the Q6600 doesn't support, in DirectX 12 under drivers newer than 16.4. Anyways, I put his PC through some abuse to get numbers (edit: to clarify, this is the PC with a stock Q6600 and Radeon 260X):

Default:
Graphics 1340
CPU 1224
Total 1321

1440x900, async compute on:
Graphics 3043
CPU 1162

1440x900, async compute off:
Graphics 2918
CPU 1140

Over 100 point improvement at 1440x900 with async on over async off. So asynchronous compute may in fact have a (minor) benefit on all around low end systems.

Edit: Just tried running the demo for the first time on my brother's PC along with the tests. The demo is an absolute slideshow even at 1440x900. 1-2 FPS, whether or not async compute is on. Good lord.
 
Last edited:

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,250
136
Edit: Just tried running the demo for the first time on my brother's PC along with the tests. The demo is an absolute slideshow, 1-2 FPS, whether or not async compute is on. Good lord.

Too funny....When watching my rig I thought it was too choppy. :)

Dang Zotac and their aggressive core/boost clocks on the 1070 AMP Edition didn't really leave me much room to OC the core. Memory on the other hand I've tested up to 9408 so far. Looks like I get a decent little bump in overall score each time. So far I've only did a measly +20 on the core. I'm kind of thinking they used something similar to bin the chips for the AMP extreme vs the AMP cards.

Is there some kind of sweet spot for the GDDR5 as far as speed/timings go? I'm getting tired of messing with the ram speed. Would be nice if it was one of those 9469 nets the best results in the end kind of things.