News 2990WX Threadripper Performance Regression FIXED (for certain workloads) on Windows*

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Mar 13, 2006
10,089
48
126
#26
He proves it by comparing a 2990X and Epyc 7551 on Windows and on Linux and using coreprio to manipulate the performance.
Did he really "prove" it's a bug? Internet randos that think they know more than AMD or MS are a dime a dozen. There's even a guy om this very forum that has claimed he knows more about AMD CPUs than AMD themselves.
 

Hitman928

Golden Member
Apr 15, 2012
1,601
59
136
#27
But, but, BUT..... it’s nothing to do with NUMA!
You guys are funny.

The title of this thread should really be: "Nerds finally prove true what Kris said six months ago."
Context is important. The discussion was about CPU utilization and performance in handbrake. The reason why Mark's cpu usage was only hitting less than half available threads is because that is all handbrake will scale to, it doesn't matter if there are numa nodes or not in this instance. You are running a dual socket board which means you would have a numa node issue as well so I don't know why you even brought it up. Neither of you were scaling beyond half of your available cores. The main difference was avx2 support.
 

Kedas

Junior Member
Dec 6, 2018
20
9
36
#28
It does seem that epyc support a setting for a way around it, meaning the number of users actuality having this problem is limited and those with the problem can choose not to use Windows 10.

Could MS not say: "well we don't guarantee full performance with more than 2 NUMA nodes in Windows 10, it's not made for it, you have to accept that."
Does Windows server make a difference?
Doesn't this same problem also occur on an 4 socket intel machine?
Then MS, intel, AMD would know about this limitation a long time ago.

So I think it's high risk and very low priority (few users and there exist a way around it)
if you have an 32 cores machine and running Windows 10 then you are not labeled sane ;) (in 2018)
 

ub4ty

Senior member
Jun 21, 2017
749
304
96
#29
It does seem that epyc support a setting for a way around it, meaning the number of users actuality having this problem is limited and those with the problem can choose not to use Windows 10.

Could MS not say: "well we don't guarantee full performance with more than 2 NUMA nodes in Windows 10, it's not made for it, you have to accept that."
Does Windows server make a difference?
Doesn't this same problem also occur on an 4 socket intel machine?
Then MS, intel, AMD would know about this limitation a long time ago.

So I think it's high risk and very low priority (few users and there exist a way around it)
if you have an 32 cores machine and running Windows 10 then you are not labeled sane ;) (in 2018)
Actually now that you mention it doesn't Microsoft go from charging per socket to per core for their Server Grade OS with a minimum of 8 core. Guaranteed there's some ugly licensing restrictive code in there regarding these high core count configs. Virtualization software also works on a similar licensing scheme and I'm sure they are not fans of losing gobs of money with AMD shoving all of these cores onto one socket and making it available for desktop users. This is the kind of thing where it's deemed a no-fix to preserve profits. Similar to the stunt Nvidia has been pulling over the years once they found out people were using their consumer level products in professional/enterprise environments without paying the insane tax that comes along with it.
 

ericlp

Diamond Member
Dec 24, 2000
5,947
32
106
#30
Windows is a virus! Quickly get rid of it and install linux. LOL.

More than likely the reason to this taking so long to get addressed is most people that buy these chips aren't running windows to begin with and for good reason.
 

moinmoin

Senior member
Jun 1, 2017
625
144
96
#31
So basically you are saying that Microsoft would be refusing to fix an obvious bug in their OS, pointed out by AMD?
Makes perfect sense, especially when AMD hasn't made any statements regarding to it.
The Windows scheduler is far inferior to what Linux is capable of for much of two decades now, and Microsoft never bothered to fundamentally fix it. Blaming AMD for exposing this is completely disingenuous.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
64
106
#32
The Windows scheduler is far inferior to what Linux is capable of for much of two decades now, and Microsoft never bothered to fundamentally fix it. Blaming AMD for exposing this is completely disingenuous.
Oh, I agree.
NUMA does not belong to desktop or consumer systems.

Nevertheless, Threadripper is marketed as a consumer CPU.

Back in the day, Windows 7 received a hotfix to improve the suboptimal handling of BD compute units.
What's changed?
 

moinmoin

Senior member
Jun 1, 2017
625
144
96
#33
NUMA does not belong to desktop or consumer systems.

Nevertheless, Threadripper is marketed as a consumer CPU.
We have the word "workstation" for that, and NUMA systems always had been part of it.

Back in the day, Windows 7 received a hotfix to improve the suboptimal handling of BD compute units.
What's changed?
Nice Chewbacca defense.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
64
106
#34
Fanboys are a pestilence.
AMD's fanboys unfortunately are more ignorant, toxic and feeble-minded than all of the rest combined.

Sad to see that this site has gone down the drain as well.

Goodbye.
 

IEC

Super Moderator
Super Moderator
Jun 10, 2004
13,446
133
136
#35
We have the word "workstation" for that, and NUMA systems always had been part of it.

Nice Chewbacca defense.
Eh, I think you're missing The Stilt's point. I might disagree with it on the basis that we don't have all the information and that there could be discussion about a fix without us knowing, but to dismiss his argument out of hand and flippantly is decidedly annoying.
 

IEC

Super Moderator
Super Moderator
Jun 10, 2004
13,446
133
136
#36
Where does it say that performance was "FIXED" on Windows? What am I missing?
Right in the video title. The thread title is literally a copy-paste from the video title.
 

mattiasnyc

Senior member
Mar 30, 2017
266
80
76
#37
Right in the video title. The thread title is literally a copy-paste from the video title.
I think it would be better if the thread title either made that clear or better reflected the results. The actual wording implies that there was a problem on Windows and that the problem was 100% fixed.... on Windows presumably by MS... which doesn't appear to be the case.

I personally saw that video elsewhere and forgot the title once I got to the end where he says it fixes Indigo, not any and all issues. So reading the thread title gives I think a bit of a misleading impression.
 

daveybrat

Super Moderator
Super Moderator
Jan 31, 2000
4,894
24
126
#38
Folks, let's stick to the topic and constructive dialogue. Any further personal insults will be dealt with appropriately.

Thanks!
 
Last edited by a moderator:

IEC

Super Moderator
Super Moderator
Jun 10, 2004
13,446
133
136
#39
I think it would be better if the thread title either made that clear or better reflected the results. The actual wording implies that there was a problem on Windows and that the problem was 100% fixed.... on Windows presumably by MS... which doesn't appear to be the case.

I personally saw that video elsewhere and forgot the title once I got to the end where he says it fixes Indigo, not any and all issues. So reading the thread title gives I think a bit of a misleading impression.
Fair enough, I edited the thread title with an editorial comment.
 
Mar 13, 2006
10,089
48
126
#40
Fanboys are a pestilence.
AMD's fanboys unfortunately are more ignorant, toxic and feeble-minded than all of the rest combined.

Sad to see that this site has gone down the drain as well.

Goodbye.
Bummer. See you around somewhere I hope.

Insecure AMD fanboys chase off another great industry resource. No wonder only one actual cpu designer still posts here. AMD fanboys even chased off the AMD employees that unofficially posted here.

Sad, really.

What's sad is you continuing to ignore moderator warnings about keeping
the discussion on topic, and your insistence on using phrases here to describe
other users such as "fanboys".


AT Mod Usandthem
 
Last edited by a moderator:
Apr 27, 2000
10,481
334
126
#41
Bummer. See you around somewhere I hope.

Insecure AMD fanboys chase off another great industry resource. No wonder only one actual cpu designer still posts here. AMD fanboys even chased off the AMD employees that unofficially posted here.

Sad, really.
It is sad, though I have to say I'm surprised. We've had much, much worse AMD fanboy-ism in the past, when their products were far worse than they are today. Remember the endless FX threads? I do not miss those.

I have a feeling that The_Stilt may have wanted to leave for awhile and saw this as an opportunity. Because honestly this thread has been pretty mild.

Please keep the discussion on topic, and
do not use insults such as "fanboy" to insult
other users.


AT Mod Usandthem
 
Last edited by a moderator:

moinmoin

Senior member
Jun 1, 2017
625
144
96
#42
Eh, I think you're missing The Stilt's point. I might disagree with it on the basis that we don't have all the information and that there could be discussion about a fix without us knowing, but to dismiss his argument out of hand and flippantly is decidedly annoying.
I'm not even sure what one point he was trying to push. Threadripper is a workstation chip, workstations have a history of often using server chips, often multi-processor in a NUMA arrangement. Of course one can still have the opinion that NUMA does not belong to desktop or consumer systems (and as such defend Microsoft inactivity in that regard) but I see that as being beside the point in workstation systems. If Microsoft wants to segregate their systems, with working NUMA schedulers only on server grade versions of Windows then they'd communicate that accordingly.

That the same Microsoft pushed a hotfix to improve the suboptimal handling of BD compute units is a nice anecdote (unlike the whole system of NUMA with all its possible configurations compute units actually were exotic), but isn't tangent to Microsoft's handling of NUMA, or of threads in general.

Let's recall that Microsoft's scheduler was already known to be pushing the worst case for Ryzen. Ryzen 1xxx suffered on a higher than promised latency (AMD's fault obviously, fixed in Ryzen 2xxx), at that was exacerbated by the Windows scheduler by repeatedly senselessly moving around the threads without regard for CCX units (or any topography for that matter, at high frequency on consumer versions of Windows, at lower frequency in server versions, whereas Linux showed the whole approach could be done without). In the past Microsoft has essentially made it a point of doing nothing itself and instead relying on 3rd party software to optimize low level system behavior that should be handled by the OS.

Thanks to Linux we know 32c/24c Threadripper's NUMA configuration doesn't need to degrade the performance. In The Stilt's first post in this thread, what I saw as his point, he turned that around and put the blame for the performance regression under Microsoft's Windows solely on AMD.

I'm saddened that he chose that as the hill to die on and deactivate his account. I don't see how my input was any "pestilence, ignorant, toxic and feeble-minded", maybe somebody else can explain.
 

TheELF

Platinum Member
Dec 22, 2012
2,637
44
106
#43
that was exacerbated by the Windows scheduler by repeatedly senselessly moving around the threads without regard for CCX units (or any topography for that matter, at high frequency on consumer versions of Windows, at lower frequency in server versions, whereas Linux showed the whole approach could be done without
Actually can anybody test this?
In the video he starts to talk about thread migration at 16:00 ,so can somebody run coreprio but then run something single threaded and see if it always stays on one core? (or ccx)

Windows scheduler is not senselessly repeatedly moving around the threads,it does so to prevent stress on the TIM by making the heat distribution more uniform,also it prevents single cores from degrading over time by running at full turbo all the time,also there is no performance penalty whatsoever for normal CPUs so...win-win.
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
16,947
359
136
#44
Actually can anybody test this?
In the video he starts to talk about thread migration at 16:00 ,so can somebody run coreprio but then run something single threaded and see if it always stays on one core? (or ccx)

Windows scheduler is not senselessly repeatedly moving around the threads,it does so to prevent stress on the TIM by making the heat distribution more uniform,also it prevents single cores from degrading over time by running at full turbo all the time,also there is no performance penalty whatsoever for normal CPUs so...win-win.
And why is that required ? Unix/Linux does not do that, and is used by most data centers and is 30% more efficient as an OS. As you may see, I have that CPU. I could do some testing if you gave me specifics on how to do it, and links to any software required. I run Windows and linux dual-boot, but it runs linux most of the time, since I get 30% more out of it.
 

TheELF

Platinum Member
Dec 22, 2012
2,637
44
106
#45
And why is that required ? Unix/Linux does not do that, and is used by most data centers and is 30% more efficient as an OS. As you may see, I have that CPU. I could do some testing if you gave me specifics on how to do it, and links to any software required. I run Windows and linux dual-boot, but it runs linux most of the time, since I get 30% more out of it.
Because if this performance loss is due to the latency between the cores/ccx when swapping threads between them then it's not a bug with windows, it's a quirk of the tr/epic architecture and MS will probably not fix it for a mainstream home OS because that's not what windows is for anyway.


Do exactly what the video/bitsum page says,bitsum claims that it only works 50% of the times so make sure it did "take"

Run cinebench or whatever else but single-threaded, in cine you can go to file->preferences and select to run only one thread

Use the task manager performance tab to see which logical cores do the work.
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
16,947
359
136
#46
Because if this performance loss is due to the latency between the cores/ccx when swapping threads between them then it's not a bug with windows, it's a quirk of the tr/epic architecture and MS will probably not fix it for a mainstream home OS because that's not what windows is for anyway.


Do exactly what the video/bitsum page says,bitsum claims that it only works 50% of the times so make sure it did "take"

Run cinebench or whatever else but single-threaded, in cine you can go to file->preferences and select to run only one thread

Use the task manager performance tab to see which logical cores do the work.
I am deaf, so I don't watch the videos, since I can't hear them. Can you summarize ?

And again, linux does not have this problem, in addition to being 30% more efficient on the same code, so I still blame it on MS/Windows.
 

TheELF

Platinum Member
Dec 22, 2012
2,637
44
106
#47
I am deaf, so I don't watch the videos, since I can't hear them. Can you summarize ?
He's just saying that half the time is spend on shuffling threads instead of doing the work which is why the regression is so big.
And again, linux does not have this problem, in addition to being 30% more efficient on the same code, so I still blame it on MS/Windows.
Linux also doesn't have to worry about liabilities if a core overheats.
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
16,947
359
136
#48
He's just saying that half the time is spend on shuffling threads instead of doing the work which is why the regression is so big.

Linux also doesn't have to worry about liabilities if a core overheats.
My point is, they don't, and I run 24/7 overclocked. I only use an MS computer where required, as Windows is just bad.
 

TheELF

Platinum Member
Dec 22, 2012
2,637
44
106
#49
My point is, they don't, and I run 24/7 overclocked. I only use an MS computer where required, as Windows is just bad.
And you are always using your systems at full utilization anyway so of course thread migration doesn't make sense for you,as I said before the mainstream home versions of windows are not made for this kind of work,it's all battery life and eco as default,doesn't mean it's broken it's just targeted at a different market.
 

Topweasel

Diamond Member
Oct 19, 2000
4,556
172
126
#50
I mean didn't we already see this with the parking issue when Ryzen first came out. I know that this added on top of that by bouncing between nodes, but is't this just more of the same from the windows scheduler?
 


ASK THE COMMUNITY