News 2990WX Threadripper Performance Regression FIXED (for certain workloads) on Windows*

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,328
4,913
136
*Thread title was originally a copy-paste from video title. Editorial comment added in parentheses.

Per Level1Techs, it appears there is a Windows kernel bug that has led to the strange results like the TR 2990X losing to the TR 2950X in some tests such as Adobe Premiere, Indigo’s Renderer, Blender, 7zip, etc. It turns out, it's mostly not a memory bandwidth issue. It's a Windows scheduler bug that burns CPU cycles unproductively with how it handles >2 NUMA nodes (possibly due to a bandaid/fix for XCC Xeons). He proves it by comparing a 2990X and Epyc 7551 on Windows and on Linux and using coreprio to manipulate the performance.

Full article:
https://level1techs.com/article/unlocking-2990wx-less-numa-aware-apps

Video:

Conclusion:
"The rumors of a memory bandwidth problem, even with 32 cores (at least in these instances), has been greatly exaggerated."

Interpretation:
With server-like CPUs now easily available for consumers, Microsoft has some catching up if they want us to run Windows rather than Linux.

Update 1/14/2019:
AMD comments on Threadripper 2 Performance and Windows Schedule (AT article by Ian Cutress)
 
Last edited:

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
He proves it by comparing a 2990X and Epyc 7551 on Windows and on Linux and using coreprio to manipulate the performance.

Did he really "prove" it's a bug? Internet randos that think they know more than AMD or MS are a dime a dozen. There's even a guy om this very forum that has claimed he knows more about AMD CPUs than AMD themselves.
 

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,792
136
But, but, BUT..... it’s nothing to do with NUMA!
You guys are funny.

The title of this thread should really be: "Nerds finally prove true what Kris said six months ago."

Context is important. The discussion was about CPU utilization and performance in handbrake. The reason why Mark's cpu usage was only hitting less than half available threads is because that is all handbrake will scale to, it doesn't matter if there are numa nodes or not in this instance. You are running a dual socket board which means you would have a numa node issue as well so I don't know why you even brought it up. Neither of you were scaling beyond half of your available cores. The main difference was avx2 support.
 

Kedas

Senior member
Dec 6, 2018
355
339
136
It does seem that epyc support a setting for a way around it, meaning the number of users actuality having this problem is limited and those with the problem can choose not to use Windows 10.

Could MS not say: "well we don't guarantee full performance with more than 2 NUMA nodes in Windows 10, it's not made for it, you have to accept that."
Does Windows server make a difference?
Doesn't this same problem also occur on an 4 socket intel machine?
Then MS, intel, AMD would know about this limitation a long time ago.

So I think it's high risk and very low priority (few users and there exist a way around it)
if you have an 32 cores machine and running Windows 10 then you are not labeled sane ;) (in 2018)
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
It does seem that epyc support a setting for a way around it, meaning the number of users actuality having this problem is limited and those with the problem can choose not to use Windows 10.

Could MS not say: "well we don't guarantee full performance with more than 2 NUMA nodes in Windows 10, it's not made for it, you have to accept that."
Does Windows server make a difference?
Doesn't this same problem also occur on an 4 socket intel machine?
Then MS, intel, AMD would know about this limitation a long time ago.

So I think it's high risk and very low priority (few users and there exist a way around it)
if you have an 32 cores machine and running Windows 10 then you are not labeled sane ;) (in 2018)
Actually now that you mention it doesn't Microsoft go from charging per socket to per core for their Server Grade OS with a minimum of 8 core. Guaranteed there's some ugly licensing restrictive code in there regarding these high core count configs. Virtualization software also works on a similar licensing scheme and I'm sure they are not fans of losing gobs of money with AMD shoving all of these cores onto one socket and making it available for desktop users. This is the kind of thing where it's deemed a no-fix to preserve profits. Similar to the stunt Nvidia has been pulling over the years once they found out people were using their consumer level products in professional/enterprise environments without paying the insane tax that comes along with it.
 

ericlp

Diamond Member
Dec 24, 2000
6,133
219
106
Windows is a virus! Quickly get rid of it and install linux. LOL.

More than likely the reason to this taking so long to get addressed is most people that buy these chips aren't running windows to begin with and for good reason.
 

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
So basically you are saying that Microsoft would be refusing to fix an obvious bug in their OS, pointed out by AMD?
Makes perfect sense, especially when AMD hasn't made any statements regarding to it.
The Windows scheduler is far inferior to what Linux is capable of for much of two decades now, and Microsoft never bothered to fundamentally fix it. Blaming AMD for exposing this is completely disingenuous.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
The Windows scheduler is far inferior to what Linux is capable of for much of two decades now, and Microsoft never bothered to fundamentally fix it. Blaming AMD for exposing this is completely disingenuous.

Oh, I agree.
NUMA does not belong to desktop or consumer systems.

Nevertheless, Threadripper is marketed as a consumer CPU.

Back in the day, Windows 7 received a hotfix to improve the suboptimal handling of BD compute units.
What's changed?
 

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,328
4,913
136
We have the word "workstation" for that, and NUMA systems always had been part of it.

Nice Chewbacca defense.

Eh, I think you're missing The Stilt's point. I might disagree with it on the basis that we don't have all the information and that there could be discussion about a fix without us knowing, but to dismiss his argument out of hand and flippantly is decidedly annoying.
 

mattiasnyc

Senior member
Mar 30, 2017
356
337
136
Right in the video title. The thread title is literally a copy-paste from the video title.

I think it would be better if the thread title either made that clear or better reflected the results. The actual wording implies that there was a problem on Windows and that the problem was 100% fixed.... on Windows presumably by MS... which doesn't appear to be the case.

I personally saw that video elsewhere and forgot the title once I got to the end where he says it fixes Indigo, not any and all issues. So reading the thread title gives I think a bit of a misleading impression.
 

daveybrat

Elite Member
Super Moderator
Jan 31, 2000
5,732
949
126
Folks, let's stick to the topic and constructive dialogue. Any further personal insults will be dealt with appropriately.

Thanks!
 
Last edited by a moderator:

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,328
4,913
136
I think it would be better if the thread title either made that clear or better reflected the results. The actual wording implies that there was a problem on Windows and that the problem was 100% fixed.... on Windows presumably by MS... which doesn't appear to be the case.

I personally saw that video elsewhere and forgot the title once I got to the end where he says it fixes Indigo, not any and all issues. So reading the thread title gives I think a bit of a misleading impression.

Fair enough, I edited the thread title with an editorial comment.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Fanboys are a pestilence.
AMD's fanboys unfortunately are more ignorant, toxic and feeble-minded than all of the rest combined.

Sad to see that this site has gone down the drain as well.

Goodbye.

Bummer. See you around somewhere I hope.

Insecure AMD fanboys chase off another great industry resource. No wonder only one actual cpu designer still posts here. AMD fanboys even chased off the AMD employees that unofficially posted here.

Sad, really.

What's sad is you continuing to ignore moderator warnings about keeping
the discussion on topic, and your insistence on using phrases here to describe
other users such as "fanboys".


AT Mod Usandthem
 
Last edited by a moderator:
  • Like
Reactions: CHADBOGA

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
Bummer. See you around somewhere I hope.

Insecure AMD fanboys chase off another great industry resource. No wonder only one actual cpu designer still posts here. AMD fanboys even chased off the AMD employees that unofficially posted here.

Sad, really.

It is sad, though I have to say I'm surprised. We've had much, much worse AMD fanboy-ism in the past, when their products were far worse than they are today. Remember the endless FX threads? I do not miss those.

I have a feeling that The_Stilt may have wanted to leave for awhile and saw this as an opportunity. Because honestly this thread has been pretty mild.

Please keep the discussion on topic, and
do not use insults such as "fanboy" to insult
other users.


AT Mod Usandthem
 
Last edited by a moderator:

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
Eh, I think you're missing The Stilt's point. I might disagree with it on the basis that we don't have all the information and that there could be discussion about a fix without us knowing, but to dismiss his argument out of hand and flippantly is decidedly annoying.
I'm not even sure what one point he was trying to push. Threadripper is a workstation chip, workstations have a history of often using server chips, often multi-processor in a NUMA arrangement. Of course one can still have the opinion that NUMA does not belong to desktop or consumer systems (and as such defend Microsoft inactivity in that regard) but I see that as being beside the point in workstation systems. If Microsoft wants to segregate their systems, with working NUMA schedulers only on server grade versions of Windows then they'd communicate that accordingly.

That the same Microsoft pushed a hotfix to improve the suboptimal handling of BD compute units is a nice anecdote (unlike the whole system of NUMA with all its possible configurations compute units actually were exotic), but isn't tangent to Microsoft's handling of NUMA, or of threads in general.

Let's recall that Microsoft's scheduler was already known to be pushing the worst case for Ryzen. Ryzen 1xxx suffered on a higher than promised latency (AMD's fault obviously, fixed in Ryzen 2xxx), at that was exacerbated by the Windows scheduler by repeatedly senselessly moving around the threads without regard for CCX units (or any topography for that matter, at high frequency on consumer versions of Windows, at lower frequency in server versions, whereas Linux showed the whole approach could be done without). In the past Microsoft has essentially made it a point of doing nothing itself and instead relying on 3rd party software to optimize low level system behavior that should be handled by the OS.

Thanks to Linux we know 32c/24c Threadripper's NUMA configuration doesn't need to degrade the performance. In The Stilt's first post in this thread, what I saw as his point, he turned that around and put the blame for the performance regression under Microsoft's Windows solely on AMD.

I'm saddened that he chose that as the hill to die on and deactivate his account. I don't see how my input was any "pestilence, ignorant, toxic and feeble-minded", maybe somebody else can explain.
 

TheELF

Diamond Member
Dec 22, 2012
3,973
730
126
that was exacerbated by the Windows scheduler by repeatedly senselessly moving around the threads without regard for CCX units (or any topography for that matter, at high frequency on consumer versions of Windows, at lower frequency in server versions, whereas Linux showed the whole approach could be done without
Actually can anybody test this?
In the video he starts to talk about thread migration at 16:00 ,so can somebody run coreprio but then run something single threaded and see if it always stays on one core? (or ccx)

Windows scheduler is not senselessly repeatedly moving around the threads,it does so to prevent stress on the TIM by making the heat distribution more uniform,also it prevents single cores from degrading over time by running at full turbo all the time,also there is no performance penalty whatsoever for normal CPUs so...win-win.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
Actually can anybody test this?
In the video he starts to talk about thread migration at 16:00 ,so can somebody run coreprio but then run something single threaded and see if it always stays on one core? (or ccx)

Windows scheduler is not senselessly repeatedly moving around the threads,it does so to prevent stress on the TIM by making the heat distribution more uniform,also it prevents single cores from degrading over time by running at full turbo all the time,also there is no performance penalty whatsoever for normal CPUs so...win-win.
And why is that required ? Unix/Linux does not do that, and is used by most data centers and is 30% more efficient as an OS. As you may see, I have that CPU. I could do some testing if you gave me specifics on how to do it, and links to any software required. I run Windows and linux dual-boot, but it runs linux most of the time, since I get 30% more out of it.
 

TheELF

Diamond Member
Dec 22, 2012
3,973
730
126
And why is that required ? Unix/Linux does not do that, and is used by most data centers and is 30% more efficient as an OS. As you may see, I have that CPU. I could do some testing if you gave me specifics on how to do it, and links to any software required. I run Windows and linux dual-boot, but it runs linux most of the time, since I get 30% more out of it.
Because if this performance loss is due to the latency between the cores/ccx when swapping threads between them then it's not a bug with windows, it's a quirk of the tr/epic architecture and MS will probably not fix it for a mainstream home OS because that's not what windows is for anyway.


Do exactly what the video/bitsum page says,bitsum claims that it only works 50% of the times so make sure it did "take"

Run cinebench or whatever else but single-threaded, in cine you can go to file->preferences and select to run only one thread

Use the task manager performance tab to see which logical cores do the work.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
Because if this performance loss is due to the latency between the cores/ccx when swapping threads between them then it's not a bug with windows, it's a quirk of the tr/epic architecture and MS will probably not fix it for a mainstream home OS because that's not what windows is for anyway.


Do exactly what the video/bitsum page says,bitsum claims that it only works 50% of the times so make sure it did "take"

Run cinebench or whatever else but single-threaded, in cine you can go to file->preferences and select to run only one thread

Use the task manager performance tab to see which logical cores do the work.
I am deaf, so I don't watch the videos, since I can't hear them. Can you summarize ?

And again, linux does not have this problem, in addition to being 30% more efficient on the same code, so I still blame it on MS/Windows.
 

TheELF

Diamond Member
Dec 22, 2012
3,973
730
126
I am deaf, so I don't watch the videos, since I can't hear them. Can you summarize ?
He's just saying that half the time is spend on shuffling threads instead of doing the work which is why the regression is so big.
And again, linux does not have this problem, in addition to being 30% more efficient on the same code, so I still blame it on MS/Windows.
Linux also doesn't have to worry about liabilities if a core overheats.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136
He's just saying that half the time is spend on shuffling threads instead of doing the work which is why the regression is so big.

Linux also doesn't have to worry about liabilities if a core overheats.
My point is, they don't, and I run 24/7 overclocked. I only use an MS computer where required, as Windows is just bad.
 

TheELF

Diamond Member
Dec 22, 2012
3,973
730
126
My point is, they don't, and I run 24/7 overclocked. I only use an MS computer where required, as Windows is just bad.
And you are always using your systems at full utilization anyway so of course thread migration doesn't make sense for you,as I said before the mainstream home versions of windows are not made for this kind of work,it's all battery life and eco as default,doesn't mean it's broken it's just targeted at a different market.
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
I mean didn't we already see this with the parking issue when Ryzen first came out. I know that this added on top of that by bouncing between nodes, but is't this just more of the same from the windows scheduler?