Kromaatikse
Member
- Mar 4, 2017
- 83
- 169
- 56
The problem is the "Processor group" split, not the "NUMA group" split. If you could create two NUMA groups, without creating two Processor groups, it might give a better result, but it doesn't look like Microsoft is offering that option.
Actually, when I tried testing the split on Ryzen this morning, it did not work as it does with other CPUs. It only ended up disabling the cores after the boundary that should have started an additional NUMA node. Perhaps there is some whitelisting / configuration that needs to be coded into Windows to properly support manual NUMA splitting with Ryzen.
Either way, at least we can put the scheduler rumors to bed now that AMD has released a statement confirming the scheduler is not the issue (in Windows 10 and in Windows 7).
Colour me confused, then.
My best interpretation of AMD's statement is: Windows is choosing the cores in the right order for performance (physical before virtual). But this is not implemented by the scheduler - rather, by the core parking algorithm.
They are however saying nothing about Windows' insistence on moving threads around so restlessly. That *is* a property of the scheduler, and it *is* harmful, on both Intel and AMD CPUs.
It is disabled.
"Comme pour au dessus, nous réalisons les tests à 3 GHz, le SMT est désactivé pour limiter la variabilité."
3GHz and SMT disabled
My guess is that Microsoft plainly said: we aren't going to fix it for you, AMD. So AMD decided to stay in Microsoft's good graces, lest Microsoft further help Intel's monopoly.
Thanks, I'm testing with both configurations right now. Cache performance seems to be positively abhorrent with random accesses. Even in-page.
The justification for moving threads is simple: scheduler cycles threads on any CPU all the damn time, so trying to force a thread onto the same core is actually a gamble for cache remaining hot against a whole quantum of time for thread being wasted. Workaround is basic: force highest priority on threads you do not want to see shuffled that often.They are however saying nothing about Windows' insistence on moving threads around so restlessly. That *is* a property of the scheduler, and it *is* harmful, on both Intel and AMD CPUs.
Wait, you are talking about L3, right (since well, L2 looks alright from what we see)? As in, it is slower than Phenom II? I'll blame memory instead of accepting it.I will say one thing: Ryzen has the worst cache performance characteristics of any CPU of mine I have tested (so far) when dealing with random data.
We're talking about THREE TIMES worse... That's the main problem with some of the games.
The justification for moving threads is simple: scheduler cycles threads on any CPU all the damn time, so trying to force a thread onto the same core is actually a gamble for cache remaining hot against a whole quantum of time for thread being wasted. Workaround is basic: force highest priority on threads you do not want to see shuffled that often.
Any guesses as to why?
BTW, found this memory latency scaling graph, wonder if it can get a bit better if they open up access to more timings.Can't quickly find any tests with bclk OC as those could be helpful.
![]()
Also, i am the only one who finds the policy of reporting junction temperature offset by 20 degrees on X SKUs for nothing but to force fan to be louder... weird?
So we're back to the stupid manual workaround using CPU affinity. Hooray.
This is actually quite typical Microsoft attitude. As long as they have some convoluted workaround that needs applying on an application-by-application or installation-by-installation basis, they can pad their revenue streams from Support Incidents and training courses for Microsoft Certified Highly Paid Consultants. Indirectly, it even perversely reinforces the lock-in factor, because people are loath to discard all those tweaks they've built up over the years, even if the best alternative doesn't need them.
Any guesses as to why?
BTW, found this memory latency scaling graph, wonder if it can get a bit better if they open up access to more timings.Can't quickly find any tests with bclk OC as those could be helpful.
As for M$,maybe they are upset that Ryzen works on Win 7 since forcing AMD to not officially support it wasn't enough.
![]()
setting affinity to every other logical core forces workloads to just two cores
That would work if your threads never had to interact with OS.As I noted earlier, this is about threads which have nothing competing with them for the same core. The optimal strategy in that case, from a cache-miss, branch-predict and context-switch perspective, is always to keep it on that same core.
Their statement clearly implies it is the only reading of temperature from CPU.If it's the only reported temperature directly from the CPU, then yes it is weird. I would expect to see a true reading alongside the "fan reference" tweaked one.
Wait, wait wait, difference between what?That's a difference of only 5ns - or ~15 cycles.
I'm seeing differences of 60 cycles on Intel and > 300 cycles on Ryzen.
I'll make charts soon.
Actually, when I tried testing the split on Ryzen this morning, it did not work as it does with other CPUs. It only ended up disabling the cores after the boundary that should have started an additional NUMA node. Perhaps there is some whitelisting / configuration that needs to be coded into Windows to properly support manual NUMA splitting with Ryzen.
Either way, at least we can put the scheduler rumors to bed now that AMD has released a statement confirming the scheduler is not the issue (in Windows 10 and in Windows 7).
Is that with core parking on or off?
Are they windows bugs? Did AMD release any info on SMT yet? If it is like IBM's power it might just join the two smt threads of a core into a super slice which would be proper and logical behaviour. (from the standpoint of someone who has no idea (me) about how power works)I've found at least two bugs in Windows related to Ryzen (setting groupsize disables half of Ryzen cores, but acts properly on Intel (Windows 7 - haven't tested with 10, yet) and setting affinity to every other logical core forces workloads to just two cores (Windows 10, about to test on 7)).
Wait, wait wait, difference between what?
we are waiting for your results!Intel has 60 cycles latency in random access... Ryzen has > 300cycles in the same test.
Ryzen cache latency performance is simply abysmal. It has a narrow window where it's better than intel for linear accesses thanks to a larger L2 per core.
I'm currently testing with SMT off, results are only slightly better - and only in a very narrow region.
After this series of tests I will disable a CCX and test once again, since these numbers are likely including the CCX penalty.
Then, finally, I will overclocked the memory and test with one CCX, no SMT, and max memory overclock (which is a meager 2667... using DDR4-3200 RAM).
Are they windows bugs? Did AMD release any info on SMT yet? If it is like IBM's power it might just join the two smt threads of a core into a super slice which would be proper and logical behaviour. (from the standpoint of someone who has no idea (me) about how power works)
Or, AMD's PR arm isn't talking to the engineers...My guess is that Microsoft plainly said: we aren't going to fix it for you, AMD. So AMD decided to stay in Microsoft's good graces, lest Microsoft further help Intel's monopoly.