Discussion [HWUB] Nvidia has a driver overhead problem. . .

Hitman928

Diamond Member
Apr 15, 2012
5,290
7,920
136
HWUB did an investigation into GPU performance with lower end CPUs for gaming and published their results. In the comments they mention they tested more games than they showed in the video, all with the same results. As we all (should) know, in DX11 AMD had the low end CPU performance issue, but it looks like this has flipped for DX12/Vulkan. HWUB mentions they think their findings hold true for DX11 as well, but as far as I can tell, they only tested DX12/Vulkan titles so I don't think they have the data to backup that statement and doubt it is true.

(342) Nvidia Has a Driver Overhead Problem, GeForce vs Radeon on Low-End CPUs - YouTube

1615478285887.png
 

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
I was just watching this video, quite interesting. It also gives another potential reason why the RTX cards do worse at 1080p and 1440p compared with 4k vs RDNA cards, and is this something NVidia can fix.
 

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
I was just watching this video, quite interesting. It also gives another potential reason why the RTX cards do worse at 1080p and 1440p compared with 4k vs RDNA cards, and is this something NVidia can fix.

The better 1080p and 1440p results with RDNA are more likely a result of infinity cache having a better hit rate and boosting performance. There's also some possibly the AMD cards not hitting any kind of resource bottleneck that RTX cards might not suffer at higher resolutions due to having massively more shaders.

It seems as though the driver overhead issue only shows up when using older CPUs or possibly only on older CPUs that don't have a minimum number of cores. All of the more recent tests with the 3090, 6900XT, etc. would have been done using a newer Intel or AMD CPU with 8 cores, so it's unlikely the results would crop up there as well. It's certainly possible, but without definitive testing I'm leaning towards other explanations for the result.
 
  • Like
Reactions: Leeea

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,699
136
They saw the same effect with the cacheless 5700XT though, so it doesn't seem like it can be explained just by infinity cache.
It would be interesting to see it with a few more CPUs, like the higher core count Comet Lake models.
 
  • Like
Reactions: Tlh97 and Leeea

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
I suppose it could even be a potential problem at 1080p in some games, but most of the testing HUB did showed that unless you ran lower settings, it didn't seem to be an issue. I'm also curious if jumping to an 8-core CPU would alleviate the problem to some degree.

I looked over the TPU results and the games that use DX12/Vulkan and it doesn't look like there's any kind of bottleneck that's immediately apparent at 1080p when using a 9900K as the CPU. All of the Nvidia cards have a performance spread that's expected. There are a few cases where the 6900XT creeps ahead at 1080p that could be the result of the 3090 finally hitting a wall, but it's hard to know for sure just from looking at the results in isolation.
 
  • Like
Reactions: Tlh97 and Leeea

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
A post in a different thread has a link to a site that has AC:Valhalla results for a wide range of CPUs that might help us explore this in better detail: https://gamegpu.com/action-/-fps-/-tps/assassin-s-creed-valhalla-2020-test-gpu-cpu

It's a game that AMD certainly does better in than Nvidia, but we can observe the behavior being described in the HUB video for older CPUs. It has Intel CPUs going back to Haswell and results for AMD going back to the first Zen CPUs as well as processors across the product range. There's even an old Haswell-E in the list so we can see how it affects older 8-core Intel chips before those became standard.
 

Leeea

Diamond Member
Apr 3, 2020
3,625
5,368
136
Back in 2017, a guy posted a highly controversial video on this subject:
and this related reddit thread:

It basically claims AMD used hardware GPU scheduling with GCN
and Nvidia used software CPU scheduling since Kepler
for directx command lists.

It then claims that for DX11's large command lists this allows Nvidia to schedule the draw calls optimally to obtain the best performance, while the AMD's hardware scheduler was unlikely to achieve the best distribution for draw calls.

It also claims with DX12 small command lists this problem would go away for AMD because multiple threads submitting small command lists would fill the queue or something like that.

--------------------

This video has been credited and discredited all over the place. I suspect their must be some kernel of truth in it, as this seems to be directly related to what we see now.

If nvidia is bound by design to software scheduling verses hardware scheduling, it is entirely possible this is unfixable for nvidia for this generation.
 
Last edited:

Leeea

Diamond Member
Apr 3, 2020
3,625
5,368
136
I suppose it could even be a potential problem at 1080p in some games, but most of the testing HUB did showed that unless you ran lower settings, it didn't seem to be an issue. I'm also curious if jumping to an 8-core CPU would alleviate the problem to some degree.

I disagree, check out his graph of:
Watch Dogs Legion 1440p Ultra Quality @ 14:27:
https://youtu.be/JLEIJhunaW8?t=867
the rx6900xt loses by 1 fps vs rtx3090 with Ryzen 5600x, but
then wins by 15 fps with the Ryzen 2600x
and wins by 16 fps with the i3-10100*

This shows that even at high quality settings at a common resolution for many of us this CPU scaling is very relevant for the rx6000 vs the rtx3000 series.

-----------------

That above linked graph is also interesting in that the difference in FPS between a Ryzen 2600x, i3 10100, and Ryzen 5600x is less then 9 fps on the rx6900.

To me, it indicates that for owners of the i3-10100 and Ryzen 2600 equivalents can skip upgrading their CPU this generation if they have a AMD graphics card. Even Ryzen 1600 owners will get far more out of a rx6000 video card upgrade then they will get from a CPU upgrade, especially if they are switching from a nvidia card to a amd card.
 
Last edited:

DiogoDX

Senior member
Oct 11, 2012
746
277
136
Why this is news? Since the DX12 launch AMD had lower overhead than Nvidia. Maybe the gpus are powerfull enough now and we have some DX12 exclusive games to show this.

EDIT: Yeah I knew it this was nothing new.

2017 anandtech article: https://www.anandtech.com/show/11223/quick-look-vulkan-3dmark-api-overhead

86100.png


86101.png


2017 video by NerdTechGasm:

 
Last edited:

Racan

Golden Member
Sep 22, 2012
1,109
1,985
136
Well well well, how the tables have turned.. I'm kind of shocked really, lower driver overhead was a big reason for choosing Nvidia in the past for me.
 

Timorous

Golden Member
Oct 27, 2008
1,616
2,775
136
Why this is news? Since the DX12 launch AMD had lower overhead than Nvidia. Maybe the gpus are powerfull enough now and we have some DX12 exclusive games to show this.

EDIT: Yeah I knew it this was nothing new.

2017 anandtech article: https://www.anandtech.com/show/11223/quick-look-vulkan-3dmark-api-overhead

86100.png


86101.png


2017 video by NerdTechGasm:


I think it going from a theoretical issue with very few games that are impacted to a practical issue with a variety of DX12 titles is news worthy.
 

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
If it's a big issue for them they can always go back to using a hardware scheduler. It's not like they don't have any experience in designing one. It may even make sense because outside of adding improvements to ray tracing there's nothing else that stands out as an obvious improvement to Ampere. It isn't like they're going to do a 3x FP-32 design or something like that.
 

Leeea

Diamond Member
Apr 3, 2020
3,625
5,368
136
Why this is news? Since the DX12 launch AMD had lower overhead than Nvidia. Maybe the gpus are powerfull enough now and we have some DX12 exclusive games to show this.

It is news now because it is handing a 10% to 30% performance increase to AMD cards for CPUs that are not the latest and greatest.

Back in 2017 it was a theoretical thing.
 
Last edited:
  • Like
Reactions: Tlh97 and Mopetar

GodisanAtheist

Diamond Member
Nov 16, 2006
6,815
7,173
136
Considering I'm running an old CPU these results are very relevant to me.

Curious if the overhead trickles down into the mid range 3xxx series, or if this is only affecting the top tier of cards.

Never the less,was always planning to do a full rebuild for this upgrade given my 6600k has aged pretty badly.
 
  • Like
Reactions: Tlh97 and Leeea

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
I wanted to try to analyze where the bottlenecks started becoming an issue, so I put together a table from the GameGPU data for Assassin's Creed Valhall. I've created separate tables for Intel and AMD CPUs.

This first table is just a general progression of Ryzen CPUs going from older, low core count to the higher core counts and then across generations which brought IPC increases at the same general core count and clock speeds. The last column is the lowest power GPU at which there's a performance bottleneck for that particular CPU. In other words, there's no point in pairing it with a better GPU than the one listed in that column.

CPU / GPUCPU Cores / ClockRTX 3090RTX 3080RTX 2080 TiRTX 2080 SRTX 3070GTX 1080 TiRTX 2080RTX 3060 TiBottleneck @
R3 12004C4T @ 3.1 GHz5252525252525252GTX 1070
R3 1300X4C4T @ 3.4 GHz5757575757575757GTX 1070
R5 14004C8T @ 3.2 GHz6161616161616161GTX 1080
R5 1600X6C12T @ 3.6 GHz7373737373737371RTX 2080
R7 1800X8C16T @ 3.6 GHz7777777776747371RTX 2080 S
R5 2600X6C12T @ 3.6 GHz8080807776747371RTX 2080 Ti
R7 2700X8C16T @ 3.7 GHz8585817776747371RTX 3080
R7 3700X8C16T @ 3.6 GHz9687817776747371RTX 3090
R9 5900X12C24T @ 3.7 GHz9687817776747371RTX 3090

Let's look at a similar table for Intel CPUs. I left a lot of the later i7's and i9's off of the table because anything after a Kaby Lake i7 hit a GPU bottleneck so it's a lot of repeated data.

CPU / GPUCPU Cores / ClockRTX 3090RTX 3080RTX 2080 TiRTX 2080 SRTX 3070GTX 1080 TiRTX 2080RTX 3060 TiBottleneck @
i3 43302C4T @ 3.5 GHz4444444444444444GTX 1660 S
i3 71002C4T @ 3.9 GHz6161616161616161GTX 1080
i5 4670K4C4T @ 3.4 GHz6666666666666666RTX 2070 S
i7 4770K4C8T @ 3.5 GHz7575757575747371RTX 3070
i7 7700K4C8T @ 4.2 GHz9587817776747371RTX 3090
i5 9600K6C12T
@ 3.7 GHz
9587817776747371RTX 3090
i7 9700K8C8T @ 3.6 GHz9687817776747371RTX 3090

We can certainly see the issue at play and both core count (physical or virtual) and clock speed alleviate the problem. If you've got at least a 3700X or a 7700K, you probably won't have to worry too much about your choice of GPU. Granted this is only for a single title, but it is one of the newer major games to release so I don't expect too many other games to show worse results with more recent CPUs.

Also, there are some more recent, low-end CPUs that aren't in the charts above that should be examined.

CPU / GPUCPU Cores / ClocksRTX 3090RTX 3080RTX 2080 TiRTX 2080 SRTX 3070GTX 1080 TiRTX 2080RTX 3060 TiBottleneck @
R3 31004C/8T @ 3.6 GHz8787817776747371RTX 3080
R3 3100X4C/8T @ 3.8 GHz9487817776747371RTX 3090
i3 101004C/8T @ 3.6 GHz9687817776747371RTX 3090

It seems that older CPUs, particular those from AMD where the IPC wasn't quite as good as Intel had achieved with Skylake run into limitations. A modern 4-core CPU from either AMD or Intel is good enough to pair with a 3080 without bottlenecking it.
 
Last edited:

scineram

Senior member
Nov 1, 2020
361
283
106
The better 1080p and 1440p results with RDNA are more likely a result of infinity cache having a better hit rate and boosting performance. There's also some possibly the AMD cards not hitting any kind of resource bottleneck that RTX cards might not suffer at higher resolutions due to having massively more shaders.

It seems as though the driver overhead issue only shows up when using older CPUs or possibly only on older CPUs that don't have a minimum number of cores. All of the more recent tests with the 3090, 6900XT, etc. would have been done using a newer Intel or AMD CPU with 8 cores, so it's unlikely the results would crop up there as well. It's certainly possible, but without definitive testing I'm leaning towards other explanations for the result.
Massive cope and BS. LOL
 

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
Massive cope and BS. LOL

Could you elaborate on what you actually mean?

After looking through the results it doesn't appear that Nvida GPUs have additional performace unlocked for some titles once you get to a certain performance level. It's obvious that there were some considerable performance uplifts from going to from Zen 2 to Zen 3, but an R7 3700X is good enough for this particular title. Maybe a sufficiently overclocked Zen 3 CPU could see some kind of a performance gain, but it seems unlikely from what we're looking at.

We also know that the hit rate for infinity cache does increase at lower resolutions and because it's a lot faster than having to go out to main memory, would provide a potential boost at those resolutions, or at least allow the GPU to stretch a bit farther before it hits a CPU bottleneck. The data is there for the 6900XT as well, so we can find out if this is the case or not.

CPU6900 XT FPS6900 XT % of Max3090 FPS3090 % of Max3090 as % of 6900 XT
i3 43305850.8%4445.8%75.8%
R3 12006859.6%5254.2%76.5%
i3 71006960.5%6163.5%75.3%
R3 1300X7464.9%5759.4%77.0%
R5 14008171.1%6163.5%75.3%
i5 4670K8473.7%6668.8%78.6%
R5 1600X9179.8%7376.0%80.2%
R7 1800X9280.7%7780.2%83.7%
i7 4770K9583.3%7578.1%78.9%
R5 2600X9986.8%8083.3%80.8%
R7 2700X10188.6%8588.5%84.2%
i7 7700K10995.6%9598.9%87.2%
i5 9600K10995.6%9598.9%87.2%
i7 9700K10995.6%96100%88.1%
R7 3700X10995.6%96100%88.1%
i7 10900K10995.6%96100%88.1%
R5 5600X114 (SAM)100%96100%88.1%
R9 5900X114 (SAM)100%96100%88.1%

From the earlier comparison it appears that the 3090 doesn't see any gains past the 3700X or 7700K. The 6900XT actually does gain a small bump with Zen 3. I'd need to translate the site to go over the testing methodology to be sure that isn't due to something like using SAM (edit: after translating the page they are using SAM for Zen 3 CPUs) for those processors or anything else like that which may cause that result, but it is a difference between two GPUs that are in the same performance class. Both of them hit a wall where they start seeing a CPU bottleneck at around the same time. The Nvidia card does perform relatively better on newer hardware, and in particular it seems like it does better relatively with AMD CPUs, though that could just be a coincidence due to the lack of data points more than anything real.

Once the 6700 XT drops we'll be in a much better place in order to determine the actual answer because we can look at how it stacks up against the 5700 XT and even see how they stack up clock for clock.
 
Last edited:

Leeea

Diamond Member
Apr 3, 2020
3,625
5,368
136
Massive cope and BS. LOL
Troll.

------------------------------------------

I wanted to try to analyze where the bottlenecks started becoming an issue, so I put together a table from the GameGPU data.
Which game title was this data for?

------------------------------------------

That was quite a bit of work to put together Mopetar, thank you! I was delighted to see how well the i7 4770K was doing, pulling 83% of the potential performance out of that card.
 
Last edited:

Justinus

Diamond Member
Oct 10, 2005
3,174
1,517
136
Just tossing out that its been apparent since at least 2016 (when I personally noticed) nvidia cards have significantly higher CPU utilization doing dx12, such as time spy benchmark.

My RX480 would use 7-9% CPU while my 980ti would use 35-40% CPU, a difference that cannot be explained simply by the 980ti being 50% faster.
 

scineram

Senior member
Nov 1, 2020
361
283
106
Please do explain how 5600 XT beats 3070, even 3090 on some processors other than Nvidia -Redacted- drivers.

Profanity is not allowed in the Tech Forums.

Daveybrat
AT Moderator
 
Last edited by a moderator:
  • Like
Reactions: Tlh97 and Leeea

Leeea

Diamond Member
Apr 3, 2020
3,625
5,368
136
Please do explain how 5600 XT beats 3070, even 3090 on some processors other than Nvidia shitty drivers.

unverified speculation:

I do not believe it has anything to do with Nvidia's drivers. I would argue it is fundamental to the design of Nvidia's GPUs with their software based scheduler.

It is not that Nvidia's drivers are shitty, but rather Nvidia has committed itself to a design that uses a more flexible software scheduler that requires CPU time to run. They likely made this choice targeting the high end of the market that is more interested in performance then CPU usage. The CPU is arguably a better scheduler anyway, as long as it has the CPU cycles to spare.

This does not make Nvidia's cards inferior to AMD's, it gives them an advantage in systems that have CPU resources to spare. Typically high end gaming PCs where the $$$s are made.

This does not make AMD's cards inferior to Nvidia's, it gives them an advantage in systems with weak CPUs. Typically consoles, upgrade PCs, and low end gaming PCs.

In both cases, those are the traditional markets of the two companies. Nvidia at the high end, AMD at the midrange and low end.



-------------------------------------



I think Nvidia's software overhead CPU time requirements are increasing. Specifically, it likely required less cpu to run a rtx 2000 series because they perhaps had less compute units and less processing power per compute unit. Same with the GTX 1000 series, less compute units, less complexity, no rtx, no dlss, etc.

As Nvidia's graphics cards themselves have become larger, more features, and more complexity, the amount of cpu time needed to run the software scheduler has also increased.
 
Last edited:

CakeMonster

Golden Member
Nov 22, 2012
1,392
498
136
I watched the video, and I've read a bit of discussion and my memory might just be bad, but what kind of CPU usage is demanded by NVidia cards in this regard? Is it more toward single threads being maxed, or extra cores being available, or is it more architecture based on the much better results of the 5xxx series?
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
unverified speculation:

I do not believe it has anything to do with Nvidia's drivers. I would argue it is fundamental to the design of Nvidia's GPUs with their software based scheduler.

It is not that Nvidia's drivers are shitty, but rather Nvidia has committed itself to a design that uses a more flexible software scheduler that requires CPU time to run. They likely made this choice targeting the high end of the market that is more interested in performance then CPU usage. The CPU is arguably a better scheduler anyway, as long as it has the CPU cycles to spare.

This does not make Nvidia's cards inferior to AMD's, it gives them an advantage in systems that have CPU resources to spare. Typically high end gaming PCs where the $$$s are made.

This does not make AMD's cards inferior to Nvidia's, it gives them an advantage in systems with weak CPUs. Typically consoles, upgrade PCs, and low end gaming PCs.

In both cases, those are the traditional markets of the two companies. Nvidia at the high end, AMD at the midrange and low end.



-------------------------------------



I think Nvidia's software overhead CPU time requirements are increasing. Specifically, it likely required less cpu to run a rtx 2000 series because they perhaps had less compute units and less processing power per compute unit. Same with the GTX 1000 series, less compute units, less complexity, no rtx, no dlss, etc.

As Nvidia's graphics cards themselves have become larger, more features, and more complexity, the amount of cpu time needed to run the software scheduler has also increased.

This is the correct answer. NVIDIA implements a lot more stuff in the drivers. They even added DX12 support to older GPUs.

Who would use a 3090 on an Intel quad core from 6-8 years ago?