Severely CPU overhead of GTX1080Ti with RYZEN CPUs in DX-12 ??

AtenRa

Lifer
Feb 2, 2009
14,000
3,357
136
Let me tell you a secret, VEGA 64 LC is faster than GTX1080Ti ...... :p

While I was reading Coffeelake reviews I bumped in to Hardwareunboxed Core i7 8700K review.
They used both the GTX1080Ti and Vega 64 LC and what their results show is severely CPU overhead with RYZEN CPUs on the GTX1080Ti, resulting in to making the VEGA 64 faster when used with the AMD CPUs.
There are even some cases with Intel CPUs being faster with VEGA 64 vs GTX1080Ti.

The following graphs were made using data from the Hardwareunboxed Core i7 8700K review.

BattleField 1

VEGA 64 LC is 19% faster than GTX1080Ti when paired with the R5 1500X
VEGA 64 LC is 15% faster than GTX1080Ti when paired with the R5 1600
VEGA 64 LC is 10% faster than GTX1080Ti when paired with the R5 1700
VEGA 64 LC is 12% faster than GTX1080Ti when paired with the R5 1800X

148nntz.jpg



Civilization VI

VEGA 64 LC is 12% faster than GTX1080Ti when paired with the R5 1500X
VEGA 64 LC is 18% faster than GTX1080Ti when paired with the R5 1600
VEGA 64 LC is 23% faster than GTX1080Ti when paired with the R5 1700
VEGA 64 LC is 19% faster than GTX1080Ti when paired with the R5 1800X

20znpfm.jpg


Ashes Of The Singularity : Escalation seems it doesn't suffer with CPU overhead on the GTX1080Ti but again the performance difference between the two GPUs is extremely close. I would really like to see if the GTX1080Ti suffers the same CPU overhead in more DX-12 titles, that's why for now i will not generalize this behavior and keep it for only those two games above.
 
  • Like
Reactions: Drazick and psolord

Glo.

Diamond Member
Apr 25, 2015
5,661
4,419
136
It actually shows that the IPC difference between Ryzen CPUs and Intel CPUs are lower than it is commonly thought to be.

How come? AdoredTV previously have touted in his analysis Videos, that Nvidia does not have optimized drivers for Ryzen platform for their GPUs.

Nvidia purposely gimping AMD? No - they simply may have never had time to optimize their drivers for AMD Ryzen platform, and at this point it may require a lot more work on it that we think. Also - Nvidia may focus single Core throughput, instead of core count for scheduling of their GPUs, and we know that Vega GPUs do not require static scheduling, they handle it on their own. AMD drivers may "like" higher core counts. Vega+Ryzen, and Vega + Intel CPUs shows real difference in IPC between them. This situation will never be apparent in GPU bound scenario for obvious reasons.

This is my take on what is happening here.
 
  • Like
Reactions: Drazick

Glo.

Diamond Member
Apr 25, 2015
5,661
4,419
136
There will NEVER be Pure CPU limited or GPU limited scenario, remember guys. It would be possible, if you would be running code on hardware directly. In current state THERE IS ALWAYS middle ground, which muds the view. And its the software.
 

AtenRa

Lifer
Feb 2, 2009
14,000
3,357
136
It actually shows that the IPC difference between Ryzen CPUs and Intel CPUs are lower than it is commonly thought to be.

How come? AdoredTV previously have touted in his analysis Videos, that Nvidia does not have optimized drivers for Ryzen platform for their GPUs.

Nvidia purposely gimping AMD? No - they simply may have never had time to optimize their drivers for AMD Ryzen platform, and at this point it may require a lot more work on it that we think. Also - Nvidia may focus single Core throughput, instead of core count for scheduling of their GPUs, and we know that Vega GPUs do not require static scheduling, they handle it on their own. AMD drivers may "like" higher core counts. Vega+Ryzen, and Vega + Intel CPUs shows real difference in IPC between them. This situation will never be apparent in GPU bound scenario for obvious reasons.

This is my take on what is happening here.

If you take a look at the hardwareunboxed review linked above, you will see that in BF1 the Core i7 7700K with VEGA 64 LC at 720p is faster vs the GTX1080Ti. But the GTX1080Ti is faster on the same CPU at 1080p vs VEGA 64 probably because we are getting more GPU limited. That also shows that GTX1080Ti has more CPU overhead than VEGA and its not only happening with Ryzen CPUs.
 
  • Like
Reactions: Drazick

Glo.

Diamond Member
Apr 25, 2015
5,661
4,419
136
If you take a look at the hardwareunboxed review linked above, you will see that in BF1 the Core i7 7700K with VEGA 64 LC at 720p is faster vs the GTX1080Ti. But the GTX1080Ti is faster on the same CPU at 1080p vs VEGA 64 probably because we are getting more GPU limited. That also shows that GTX1080Ti has more CPU overhead than VEGA and its not only happening with Ryzen CPUs.
I was talking about a situation where both GPUs were tested with CPUs from both vendors and the margins were smaller. You focused on GPU comparison. I focused on CPU performance comparison.

99% of reviews in the wild, are comparing Ryzen CPU with Intel using Nvidia GPUs. Nobody - either reviewers, and readers have every thought: what if its Nvidia drivers responsible for poor Ryzen performance in games, compared to great performance in content creation, engineering, synthetic stuff?

Thats why Vega + Ryzen, and Vega + Intel is showing real differences between both companies. And the IPC/performance differences, to the huge disappointment of Intel fanboys, are much smaller, making Ryzen MUCH better products in the end, than its perception is currently.
 
Last edited:
  • Like
Reactions: Drazick

moinmoin

Diamond Member
Jun 1, 2017
4,934
7,619
136
Should be no surprise to anybody who followed the (loosely related) async compute saga. Nvidia is doing command scheduling on software/driver level while AMD does it in hardware. This allows Nvidia to add support for async compute to their cards afterward (and even has a well working software implementation of it in their DX11 driver), all at the cost of using CPU power (which is what the benchmarks in the OP are exposing). AMD's approach is less reliant on the CPU but uses more power in the GPU and requires support on the graphics/compute API level.
 

AtenRa

Lifer
Feb 2, 2009
14,000
3,357
136
I was talking about a situation where both GPUs were tested with CPUs from both vendors and the margins were smaller. You focused on GPU comparison. I focused on CPU performance comparison.

99% of reviews in the wild, are comparing Ryzen CPU with Intel using Nvidia GPUs. Nobody - either reviewers, and readers have every thought: what if its Nvidia drivers responsible for poor Ryzen performance in games, compared to great performance in content creation, engineering, synthetic stuff?

Thats why Vega + Ryzen, and Vega + Intel is showing real differences between both companies. And the IPC/performance differences, to the huge disappointment of Intel fanboys, are much smaller, making Ryzen MUCH better products in the end, than its perception is currently.

I know what you said but using Intel CPUs in BF1 [DX-12] @ 720p we can clearly see that GTX1080Ti is slower than Vega 64 LC.

BF1 720p ultra quality DX-12

Core i7 7700K + GTX1080Ti = 164fps min / 180fps avg
Core i7 7700K + Vega 64 LC = 175fps min / 190fps avg

Core i7 7740K + GTX1080Ti = 162fps min / 177fps avg
Core i7 77400K + Vega 64 LC = 174fps min / 188fps avg

Only the Quad Core Core i5 7600K is slower with VEGA.

So i dont think its the NV drivers with the Ryzen CPUs because we can see the same with Intel CPUs in some cases.
 

Reinvented

Senior member
Oct 5, 2005
489
77
91
I'm not a big fan of BF1 tests, especially using DX12 mostly because it's pretty un-optimized still and on current Windows release has poor performance in MANY games across the board with microstuttering still.
 

IRobot23

Senior member
Jul 3, 2017
601
183
76
I'm not a big fan of BF1 tests, especially using DX12 mostly because it's pretty un-optimized still and on current Windows release has poor performance in MANY games across the board with microstuttering still.

True. DX11 runs better on AMD RX.
BF4 Mantle and VULKAN DOOM runs great.
 

Reinvented

Senior member
Oct 5, 2005
489
77
91
True. DX11 runs better on AMD RX.
BF4 Mantle and VULKAN DOOM runs great.

Not true. DX12 just runs like garbage on BF1. Happens with Intel CPU's too, with microstuttering and dropped frames. And it's not necessarily bad game optimization alone. Part of it is due to Windows Creators update.
 

SPBHM

Diamond Member
Sep 12, 2012
5,056
409
126
as far as I know the recommended API for BF1 is DX11
perhaps for this reason the DX12 version/driver on the Nvidia side is poorly optimized
 

Reinvented

Senior member
Oct 5, 2005
489
77
91
as far as I know the recommended API for BF1 is DX11
perhaps for this reason the DX12 version/driver on the Nvidia side is poorly optimized

It's true that it's poorly optimized for Nvidia. I don't have an Nvidia card myself, but lots of users were complaining of dropped frames using DX12 for sure. And it seems AMD has fixed it before Nvidia's drivers. But that still doesn't account for the poor performance when using creators update.
 

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
So... Your post is saying that there is less CPU overhead on the AMD DX12 drivers than the Nvidia ones? Doesn't this really belong in the video card section instead?
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Did they test other games in the review or are you cherry picking?

This basically. BF1 and Civ VI aren't exactly pinnacles of DX12 programming, and have lots of performance issues running in DX12, especially on NVidia hardware.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Should be no surprise to anybody who followed the (loosely related) async compute saga. Nvidia is doing command scheduling on software/driver level while AMD does it in hardware. This allows Nvidia to add support for async compute to their cards afterward (and even has a well working software implementation of it in their DX11 driver), all at the cost of using CPU power (which is what the benchmarks in the OP are exposing). AMD's approach is less reliant on the CPU but uses more power in the GPU and requires support on the graphics/compute API level.

This is a vast oversimplification of the actual issue. While NVidia does instruction scheduling in software, actual resource scheduling is done in hardware; and this includes asynchronous compute! My guess to explain why NVidia takes such a big hit with AMD CPUs, is probably because the compiler is hyper optimized around Intel's Core series architectures, and the instruction latencies of those CPUs. With AMD CPUs having lower IPC than their Intel counterparts, it's playing havoc with the compilers ability to properly optimize the instruction stream in a timely manner for the GPU.
 

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
This basically. BF1 and Civ VI aren't exactly pinnacles of DX12 programming, and have lots of performance issues running in DX12, especially on NVidia hardware.
Or maybe Nvidia hardware has issues running DX12 software. Do we have any examples of where Nvidia hardware shines? (AMD not withstanding) Also yes, this belongs in the video card section IMO. Note that I"m not trolling here, my current graphics card is a 1080ti. It's a legit issue though. As we move forward, more and more games will offer DX12. The top 3 engines support it out of the box. If Nvidia hardware isn't up to snuff and AMD continues to push their flagships...marketshare WILL shift. It's best NVidia pulls a few AMD CPUs along with Coffeelake in and optimizes better for multi-core.
 

moinmoin

Diamond Member
Jun 1, 2017
4,934
7,619
136
While NVidia does instruction scheduling in software, actual resource scheduling is done in hardware; and this includes asynchronous compute!
It doesn't without expensive context switches, meaning whereas AMD cards can essentially work through the command queue out of order and mixing graphics and compute commands Nvidia cards are much more reliant on previous optimization of the queue (preemption, which got more fine grained in Pascal compared to the way more coarse approach before) to avoid the unnecessarily cost of context switches.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Or maybe Nvidia hardware has issues running DX12 software.

NVidia hardware has no issues with DX12. The problem is that DX12 games vary greatly in their level of optimization due to the skill and knowledge required to properly implement it. Another big problem is that beating NVidia's DX11 driver is really hard for developers to do, as the DX11 driver is super optimized and mimics some of the functions of DX12. That's why a game might run slower in DX12 than DX11 for NVidia. With AMD, it's not really a problem because their DX11 driver is nowhere near as efficient as NVidia's.

Do we have any examples of where Nvidia hardware shines? (AMD not withstanding)

NVidia gains more from DX12 in Hitman than AMD. Other great examples of NVidia performing well with DX12 is Forza Horizon 3, Gears of War 4, The Division, Sniper Elite 4. Also, Doom runs great on NVidia with Vulkan, a similar low level API.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
It doesn't without expensive context switches, meaning whereas AMD cards can essentially work through the command queue out of order and mixing graphics and compute commands Nvidia cards are much more reliant on previous optimization of the queue (preemption, which got more fine grained in Pascal compared to the way more coarse approach before) to avoid the unnecessarily cost of context switches.

Ryan covered all of this in his GTX 1080 review. You are confusing preemption with concurrency, which are two totally different concepts. Asynchronous compute for games use concurrency to run graphics and compute simultaneously if the resources allow it, so preemption doesn't enter the picture in normal gaming. For gaming, the only scenario where preemption would be used is for VR.

NVidia's dynamic load balancing feature doesn't rely on context switches, as the graphics and compute workloads are merely overlapped when running in a separate queue. This is done completely in hardware, and not in software. A software solution would not be capable of doing this due to the reaction times required, and the enormous latency penalties that would incur from using the CPU. Context switching is required if the graphics and compute are running in the same queue, and AMD can't get around this either.