News 2990WX Threadripper Performance Regression FIXED (for certain workloads) on Windows*

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,323
4,904
136
*Thread title was originally a copy-paste from video title. Editorial comment added in parentheses.

Per Level1Techs, it appears there is a Windows kernel bug that has led to the strange results like the TR 2990X losing to the TR 2950X in some tests such as Adobe Premiere, Indigo’s Renderer, Blender, 7zip, etc. It turns out, it's mostly not a memory bandwidth issue. It's a Windows scheduler bug that burns CPU cycles unproductively with how it handles >2 NUMA nodes (possibly due to a bandaid/fix for XCC Xeons). He proves it by comparing a 2990X and Epyc 7551 on Windows and on Linux and using coreprio to manipulate the performance.

Full article:
https://level1techs.com/article/unlocking-2990wx-less-numa-aware-apps

Video:

Conclusion:
"The rumors of a memory bandwidth problem, even with 32 cores (at least in these instances), has been greatly exaggerated."

Interpretation:
With server-like CPUs now easily available for consumers, Microsoft has some catching up if they want us to run Windows rather than Linux.

Update 1/14/2019:
AMD comments on Threadripper 2 Performance and Windows Schedule (AT article by Ian Cutress)
 
Last edited:

ub4ty

Senior member
Jun 21, 2017
749
898
96
Process affinity? Duh.
This is amateur hour. What the hell are both AMD/Microsoft doing that this lasted so long? Was such a mystery? This is also why I run all of my dev boxes on linux. Windows is hot garbage for any serious compute tasks. Still tough, I wouldn't dare touch the 16+ core count Threadripper CPUs with that weird configuration whereby they have no direct I/O. I'd just buy a proper epyc system. I hope with the new chip architecture whereby there is a dedicated I/O chip that all of the core complexes hook into that they resolve this wonky foolishness.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,481
14,434
136
Yeah, noticing that your CPU is lacking half of the performance is equally hard as discovering design related hardware errata which has no symptoms.
I noticed a problem on mine, and just went to linux on that box. Problem solved.

As for AMD pointing something out to MS, I am sure we are not privy to all communications between the 2.
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
I noticed a problem on mine, and just went to linux on that box. Problem solved.

As for AMD pointing something out to MS, I am sure we are not privy to all communications between the 2.
Yeah, this is too basic for them to not have known. Any low level software dev could have discovered this in a day's time and performance regression bugs of this magnitude are typically given high priority and assigned to a tiger team of engineers internally before the product gets anywhere near shipment. They probably kept it under wraps because it would do nothing but harm sales knowing that Microsoft is likely looking into and addressing it.

https://developer.amd.com/amd-uprof/
 
  • Like
Reactions: Olikan

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,323
4,904
136
Without access to privileged information, you cannot assume one way or another what, if anything, each party knows or has reported to another.

It wasn't uncommon in my time in software that some seemingly simple fixes were put on the back burner for months or even years due to it being triaged as low impact/low # of affected users. For context, said software is considered "mission critical".

So rushing to judgment without knowing all the facts is a bit premature.
 

Adawy

Member
Sep 9, 2017
79
24
41

It's the same for Intel as well, the ecosystem is different and Microsoft needs to adapt to the new hardware.
This needs to be addressed sooner rather than later, even though a minority of people are buying those Flagship CPUs, they aren't exactly cheap and the core war is only getting more intense.
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
Without access to privileged information, you cannot assume one way or another what, if anything, each party knows or has reported to another.

It wasn't uncommon in my time in software that some seemingly simple fixes were put on the back burner for months or even years due to it being triaged as low impact/low # of affected users. For context, said software is considered "mission critical".

So rushing to judgment without knowing all the facts is a bit premature.
You don't need access to privilege information to decipher this. If you've worked in the industry, you know exactly how this works.

AMD 100% knew about this. It would have been discovered in basic systems testing.
Once AMD found out about this, it should have taken a competent engineer less than a day with basic profiling tools to discover the root issue residing with Windows. From there comes a Sev 1 (show stopper) that goes all the way up the management chain to very high levels showing that a brand new high performance product is underperforming due to a bug. Microsoft was likely directly contacted through high level channels and made aware of it and both parties decided it would be better to keep this under wraps while they sort out a fix. The bug then gets marked as non-public and its managed internally between AMD/Microsoft. Show-stopper attribute is taken off and sev is widdled down for political reasons. This is what I've seen time and time again in my experience. The idea that Microsoft/AMD both didn't know about this at a high level is laughable given how straight forward and glaring of an issue it is and the common sense systems testing on Windows/Linux that of course would have discovered it before it even shipped.

As an engineer, I don't buy into any foolish PR masking that's done after the fact. There's non-publicly tracked bugs that would make people's heads spin at every tech company in existence. However, the idea that no one knows about them (at a high level) is laughable. Release management knows about all of this stuff. It's their job to de-escalate and mitigate it after the fact for max profit.

> Sev 1 (show stopper) comes up in Release management meeting
Performance is impacted? By how much? Does it boot? Take show stopper off. Sev 3 Performance bug. Get our contact at Microsoft on the line. I want a tiger team on this and mark it (non-public). This will not be shared on the public bug portal.

This practice is so commonly known that big customers often have a team whose dedicated job is to tease out the bugs that companies don't tell them about.

That being said.. Crappy windows rears its head again. I could imagine someone made the joke that no one with such compute demands uses windows anyway.
 
Last edited:

StinkyPinky

Diamond Member
Jul 6, 2002
6,761
777
126
Process affinity? Duh.
This is amateur hour. What the hell are both AMD/Microsoft doing that this lasted so long? Was such a mystery? This is also why I run all of my dev boxes on linux. Windows is hot garbage for any serious compute tasks. Still tough, I wouldn't dare touch the 16+ core count Threadripper CPUs with that weird configuration whereby they have no direct I/O. I'd just buy a proper epyc system. I hope with the new chip architecture whereby there is a dedicated I/O chip that all of the core complexes hook into that they resolve this wonky foolishness.

Perhaps the simplest explanation is the best one? Maybe it's a hard bug to fix. Could be deep in the core of the OS.
 
  • Like
Reactions: ryan20fun

ub4ty

Senior member
Jun 21, 2017
749
898
96
Perhaps the simplest explanation is the best one? Maybe it's a hard bug to fix. Could be deep in the core of the OS.
I said nothing about the complexity of the bug.
I provided instead a simple explanation as to why AMD and Microsoft were clearly aware of it which is what was being debated... An intern in Q&A/Test could find and isolate this bug in system test over lunch. There's no question AMD/Microsoft knew about it.
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
Regardless of the complexity. Why is it that the most prevalent, and best funded OS has this bug and free linux does not ? Why does linux give me 30% more performance on the same hardware ?

Microsoft Windows is just a ripoff.
Numa config + full and proper support was never really needed or didn't have to be fully fleshed out on the consumer side of things so they likely just got caught with their pants down whereas Linux constantly churns through tons of enterprise hardware environments and is home in such an environment. Windows 10 is great for casual desktop but I've never seen any serious enterprise hardware run on it.
 

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
Yeah, noticing that your CPU is lacking half of the performance is equally hard as discovering design related hardware errata which has no symptoms.

I think it's a confluence of several factors. First, prior to AMD offering ThreadRipper, you wouldn't have people running into this bug on Windows. Until the system fails, no one even notices the problem. The other side of it is that even with people discovering it, the priority is probably quite low since there aren't a lot of affected users and the severity is much lower than fixing security critical bugs. Then it's a matter of fixing the problem in a way that doesn't break something else, which can be tricky if you don't understand the code all that well. Or perhaps it requires tearing out a pretty large chunk of code that's poorly designed in order to do things properly.

Regardless of the complexity. Why is it that the most prevalent, and best funded OS has this bug and free linux does not ? Why does linux give me 30% more performance on the same hardware ?

That's the wonder of open source. As soon as someone finds the bug, they can create a patch for it. The poor schlub at Microsoft probably has to dig through the bowels of ancient poorly documented code to track down the source of a problem that they don't understand terribly well and probably care about even less. With Linux you can examine and fix the code yourself, or at the very least figure out who introduced the bug and work with them to fix it if they're still active with the project.
 

BigDaveX

Senior member
Jun 12, 2014
440
216
116
Numa config + full and proper support was never really needed or didn't have to be fully fleshed out on the consumer side of things so they likely just got caught with their pants down whereas Linux constantly churns through tons of enterprise hardware environments and is home in such an environment. Windows 10 is great for casual desktop but I've never seen any serious enterprise hardware run on it.
While NUMA support has never been a widely-needed feature on the desktop, it's not really like Microsoft could have been completely unaware it was a potential issue, since the lack of NUMA support in Windows back then almost single-handedly killed the Quad FX platform back in the day. And I'm pretty sure there were more than a few enthusiasts running Socket G34 Opterons around the turn of the decade as well.
 

rvborgh

Member
Apr 16, 2014
195
94
101
Its pretty important on my overclocked quad 61xx Opteron to have NUMA mode on, and SRAT table on when running single threaded games like Balanced Annihilation. Some software like Cinebench loves non NUMA though... (node interleaving). Running it in NUMA gives almost a 1000cb deficit (2300 vs 3229).


While NUMA support has never been a widely-needed feature on the desktop, it's not really like Microsoft could have been completely unaware it was a potential issue, since the lack of NUMA support in Windows back then almost single-handedly killed the Quad FX platform back in the day. And I'm pretty sure there were more than a few enthusiasts running Socket G34 Opterons around the turn of the decade as well.
 

kjboughton

Senior member
Dec 19, 2007
330
118
116
But, but, BUT..... it’s nothing to do with NUMA!
You guys are funny.

The title of this thread should really be: "Nerds finally prove true what Kris said six months ago."

How did kjboughton get his to use 72 ?

Edit My IPC is almost the same as his, and I am at 4 ghz instead of 3.9, so I should be closer to his score or beat it. The only thing slowing me down is the 40% usage of the threads. Or do I turn off SMT ?

Oh, those NUMA nodes strike again!

It has nothing to do with NUMA nodes.