Ryzen: Strictly technical

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Feb 25, 2017
72
0
41

lolfail9001

Golden Member
Sep 9, 2016
1,056
0
96

sm625

Diamond Member
May 6, 2011
8,176
1
106
I find those histograms incredibly useful.

I think all reviewers should no longer use average fps. I think they should use median (aka 50th percentile) and other percentiles like 1% or even .1%.

Converting the histograms above gives the following for Crysis 3:

1800X
.1% 58 fps
1% 74 fps
10% 98 fps
25% 114 fps
50% 126 fps
75% 138 fps
90% 150 fps
99% 194 fps

7700K
.1% 74 fps
1% 86 fps
10% 106 fps
25% 118 fps
50% 126 fps
75% 142 fps
90% 154 fps
99% 202 fps

In other words, Ryzen has a problem at the lower end. It's numbers at the 25th % and up are fine. In fact, both have the same median. It's between .1% and 25% that the problem exists.

GTA V

1800X
.1% 70 fps
1% 74 fps
10% 82 fps
25% 82 fps
50% 90 fps
75% 98 fps
90% 106 fps
99% 126 fps

7700K
.1% 82 fps
1% 90 fps
10% 102 fps
25% 106 fps
50% 110 fps
75% 118 fps
90% 126 fps
99% 138 fps

Here, the 1800X is consistently about 80-85% as fast as the 7700K and the results are far more clustered for both chips.

Personally, I'd find these kind of numbers with .1%, 1%, and 50% and an included histogram more useful than what we get now. And with Excel, it's not hard to do.

EDIT: I converted the fps to frame time to get more accurate results as antihelten suggested. Then converted back to fps for the post.
This is what tech report is doing with their 99th percentile frame times. They focus on the 1% and I think that is the perfect spot to be looking at. They've been doing it for a while but sadly not many sites have picked up on it. And now with all this youtube garbage we seem to be regressing in terms of benchmark quality and clarity.
 

lopri

Elite Member
Jul 27, 2002
12,566
9
106
I don't know if this is useful or not but I will post it. This is a histogram of the frametimes in Battlefield 1 multiplayer, using Cam 1 of Spectator, in a full Amiens (map) 64 player Conquest server, using a 4590 (3.5GHz MC turbo), 8GB DDR3-1600. The CPU usage was always 98%+ and the GPU usage <80%. All GFX settings set to low or off @1080p.

If anyone owning a Ryzen CPU could do the same it would be great.
Does that mean "the bactrian Core i7-4590K exhibits a seemingly-more-bimodal distribution, and its frame delivery is more inconsistent—and our frame-time data bears that out." Clearly not following a Bell curve, either. lol.

P.S. The above has nothing against to dfk7677 who I very much appreciate for his/her posting .
 

HurleyBird

Golden Member
Apr 22, 2003
1,797
113
126
A multiplayer session with 64 players isn't exactly going to give you the most consistent (or reproducible) frametimes.
 

sm625

Diamond Member
May 6, 2011
8,176
1
106
To be fair the bimodal nature of the distribution is far more clear in Crysis 3:
This looks like two bell curves transposed on top of one another - one with perhaps some unknown delays, and one without. The delays are probably CCX related. I wonder how much time Ryzen spends copying blocks of data from one part of cache to another...
 

looncraz

Senior member
Sep 12, 2011
715
0
136
A multiplayer session with 64 players isn't exactly going to give you the most consistent (or reproducible) frametimes.
I think I manage pretty well:

BF4, R9 Fury (1050MHz / 500Mhz, 17.1.1 Driver, Windows 10 x64 10586)
Golmud Railway, 48/64 Players, 60Hz server

2600k@4.5:



2600k @ 3Ghz:



Not perfect, but well enough.
 
Nov 27, 2016
1,395
0
96
What do you guys make of this?!

Ryan Shrout (PC Perspective): Win10 scheduler "most assuredly" has no issues with Ryzen, "We'll have story up soon with testing and actual thought."


AMD Ryzen and the Windows 10 Scheduler - No Silver Bullet

Editor's Note: The testing you see here was a response to many days of comments and questions to our team on how and why AMD Ryzen processors are seeing performance gaps in 1080p gaming (and other scenarios) in comparison to Intel Core processors. Several outlets have posited that the culprit is the Windows 10 scheduler and its inability to properly allocate work across the different logical and physical cores of the Zen architecture. As it turns out, we can prove that isn't the case at all. -Ryan Shrout

......

Closing Thoughts

What began as a simple internal discussion about the validity of claims that Windows 10 scheduling might be to blame for some of Ryzen's performance oddities, and that an update from Microsoft and AMD might magically save us all, has turned into a full day with many people chipping in to help put together a great story. The team at PC Perspective believes strongly that the Windows 10 scheduler is not improperly assigning workloads to Ryzen processors because of a lack of architecture knowledge on the structure of the CPU.

In fact, though we are waiting for official comments we can attribute from AMD on the matter, I have been told from high knowledge individuals inside the company that even AMD does not believe the Windows 10 scheduler has anything at all to do with the problems they are investigating on gaming performance.

In the process, we did find a new source of information in our latency testing tool that clearly shows differentiation between Intel's architecture and AMD's Zen architecture for core to core communications. In this way at least, the CCX design of 8-core Ryzen CPUs appears to more closely emulate a 2-socket system. How does this new information affect our expectation of something like Naples that will depend on Infinity Fabric even more directly for AMD's enterprise play?

There is still much to learn and more to investigate as we find the secrets that this new AMD architecture has in store for us. We welcome your discussion, comments, and questions below!



 
Last edited:

looncraz

Senior member
Sep 12, 2011
715
0
136
What do you guys make of this?!

Ryan Shrout (PC Perspective): Win10 scheduler "most assuredly" has no issues with Ryzen, "We'll have story up soon with testing and actual thought."


AMD Ryzen and the Windows 10 Scheduler - No Silver Bullet

Editor's Note: The testing you see here was a response to many days of comments and questions to our team on how and why AMD Ryzen processors are seeing performance gaps in 1080p gaming (and other scenarios) in comparison to Intel Core processors. Several outlets have posited that the culprit is the Windows 10 scheduler and its inability to properly allocate work across the different logical and physical cores of the Zen architecture. As it turns out, we can prove that isn't the case at all. -Ryan Shrout

......

Closing Thoughts

What began as a simple internal discussion about the validity of claims that Windows 10 scheduling might be to blame for some of Ryzen's performance oddities, and that an update from Microsoft and AMD might magically save us all, has turned into a full day with many people chipping in to help put together a great story. The team at PC Perspective believes strongly that the Windows 10 scheduler is not improperly assigning workloads to Ryzen processors because of a lack of architecture knowledge on the structure of the CPU.

In fact, though we are waiting for official comments we can attribute from AMD on the matter, I have been told from high knowledge individuals inside the company that even AMD does not believe the Windows 10 scheduler has anything at all to do with the problems they are investigating on gaming performance.

In the process, we did find a new source of information in our latency testing tool that clearly shows differentiation between Intel's architecture and AMD's Zen architecture for core to core communications. In this way at least, the CCX design of 8-core Ryzen CPUs appears to more closely emulate a 2-socket system. How does this new information affect our expectation of something like Naples that will depend on Infinity Fabric even more directly for AMD's enterprise play?

There is still much to learn and more to investigate as we find the secrets that this new AMD architecture has in store for us. We welcome your discussion, comments, and questions below!



WRONG!

There is a MAJOR problem with the Windows 10 scheduler and Ryzen.

This is a 4+0 Ryzen config with 8 threads running on it (Cinebench R15) after I set affinity 0, 2, 4, 6:



When I do it with an i7 2600k, the result is as expected: 50% CPU usage:
(take note, I use the classic TaskMGR on Windows 10 for my personal rig - I hate almost every new UI in Windows 10).

 

piesquared

Golden Member
Oct 16, 2006
1,603
27
136
What do you guys make of this?!

Ryan Shrout (PC Perspective): Win10 scheduler "most assuredly" has no issues with Ryzen, "We'll have story up soon with testing and actual thought."


AMD Ryzen and the Windows 10 Scheduler - No Silver Bullet

Editor's Note: The testing you see here was a response to many days of comments and questions to our team on how and why AMD Ryzen processors are seeing performance gaps in 1080p gaming (and other scenarios) in comparison to Intel Core processors. Several outlets have posited that the culprit is the Windows 10 scheduler and its inability to properly allocate work across the different logical and physical cores of the Zen architecture. As it turns out, we can prove that isn't the case at all. -Ryan Shrout

......

Closing Thoughts

What began as a simple internal discussion about the validity of claims that Windows 10 scheduling might be to blame for some of Ryzen's performance oddities, and that an update from Microsoft and AMD might magically save us all, has turned into a full day with many people chipping in to help put together a great story. The team at PC Perspective believes strongly that the Windows 10 scheduler is not improperly assigning workloads to Ryzen processors because of a lack of architecture knowledge on the structure of the CPU.

In fact, though we are waiting for official comments we can attribute from AMD on the matter, I have been told from high knowledge individuals inside the company that even AMD does not believe the Windows 10 scheduler has anything at all to do with the problems they are investigating on gaming performance.

In the process, we did find a new source of information in our latency testing tool that clearly shows differentiation between Intel's architecture and AMD's Zen architecture for core to core communications. In this way at least, the CCX design of 8-core Ryzen CPUs appears to more closely emulate a 2-socket system. How does this new information affect our expectation of something like Naples that will depend on Infinity Fabric even more directly for AMD's enterprise play?

There is still much to learn and more to investigate as we find the secrets that this new AMD architecture has in store for us. We welcome your discussion, comments, and questions below!


Haha OMG not this joker again. He always seems to pop up at AMD launches with special tools in hand to do some special testing lol.

I think most people understand that the big portion of performance left on the table is due to lack of game optimizations, which AMD has said are coming.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
0
96
Haha OMG not this joker again. He always seems to pop up at AMD launches with special tools in hand to do some special testing lol.

I think most people understand that the big portion of performance left on the table is due to lack of game optimizations, which AMD has said are coming.
Most people do not understand anything. Some follow looncraz's version, i personally feel it is not all there is to it because of drawcall test (and nvme too). It just performs too good outside of gaming for it to be related to perf optimizations. Especially since these are not exactly a thing in games.
 

deadhand

Junior Member
Mar 4, 2017
21
0
51
Allyn Malventano from PCPer:

"The C++ apps are incredibly simple and are only creating threads or pinging between cores. If such a simple app must be rewritten with workarounds just for AMD processors, we have a serious problem."

Unfortunately they're completely missing scenarios where fixing certain bugs (in games and other applications) could yield great performance improvements on Intel systems, but even greater improvements on Ryzen systems. I'll produce a nice chart and describe such a scenario (which is actually kind of common) tomorrow or Sunday.
 
Feb 14, 2016
93
0
61
I asked this before and I ask it again: Did anyone setup their own Windows power profiles?

And if so, do "Core Parking" settings have an effect on Ryzen scheduling? In my experience it's not a CPU feature, but an OS feature with a fancy name where the OS scheduler tries to keep threads from jumping cores in order to allow other cores to enter deep(er) sleep states without the extra penalty of waking them up too often. This maybe could help keep threads from jumping to another CCX?!

Asus CH6 board only started shipping from most German retailers yesterday (just got it), so I still could not build my own rig. And even then I may need to order another CPU cooler, because the Arctic Liquid Freezer gets its AM4 retention module as late as April.
 
Last edited:

HurleyBird

Golden Member
Apr 22, 2003
1,797
113
126
Say whatever you want about their conclusion, but these core communication latency figures they recorded are fascinating!





Haswell-E actually has moderately lower latency when moving between logical SMT "cores" on the same physical core, but much higher latency than Ryzen when moving from physical core to physical core, at least until you try to ping a core on another CCX after which you see a massive latency spike obviously.

It's also interesting that on the ~42-42ns line for core-core communication on Ryzen there's quite a bit more variance compared to the relatively straight line on Haswell-E at around 78ns. Doesn't Haswell-E use a ring bus? I'm not a processor architect, but I'd expect that there would be less latency for cores that are closer along the ring, which is odd. Maybe someone with more knowledge can explain that to me. Could it be that the ring bus is just really, really fast? Or that there's another means of communication besides the ring bus? Or perhaps the faster paths have been artificially slowed down to meet the lowest common denominator for the sake of consistency?
 
Last edited:

malventano

Junior Member
May 27, 2009
18
0
76
PCPer.com
Haha OMG not this joker again. He always seems to pop up at AMD launches with special tools in hand to do some special testing lol.
You must be referring to the power testing hardware we used to elaborate on the RX480 power draw issues. Not only did AMD acknowledge the information we provided, they (mostly) fixed the issue. I wasn't joking, and neither were they.
I also busted out an o-scope to show differences between G-Sync and FreeSync, which not only educated folks, it likely pushed AMD to implement of Low Framerate Compensation.
Yes, I'm an electronics geek. I try to use my skill set to help the community by pushing manufacturers to improve their products.

Say whatever you want about their conclusion, but these core communication latency figures they recorded are fascinating!
Thanks. We still need to test on Intel quad core CPUs - they should have lower core-to-core latency as they aren't dealing with the larger ring bus. I used 5960X as the comparison point so we had matching core counts.

Allyn
 
Feb 6, 2011
1,792
115
136
im pretty sure the ring bus is dual counter rotating rings, so average latency ( assuming 50% go left 50% go right) would be the same.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
0
136
citavia.blog.de
Say whatever you want about their conclusion, but these core communication latency figures they recorded are fascinating!





Haswell-E actually has moderately lower latency when moving between logical SMT "cores" on the same physical core, but much higher latency than Ryzen when moving from physical core to physical core, at least until you try to ping a core on another CCX after which you see a massive latency spike obviously.

It's also interesting that on the ~42-42ns line for core-core communication on Ryzen there's quite a bit more variance compared to the relatively straight line on Haswell-E at around 78ns. Doesn't Haswell-E use a ring bus? I'm not a processor architect, but I'd expect that there would be less latency for cores that are closer along the ring, which is odd. Maybe someone with more knowledge can explain that to me. Could it be that the ring bus is just really, really fast? Or that there's another means of communication besides the ring bus? Or perhaps the faster paths have been artificially slowed down to meet the lowest common denominator for the sake of consistency?
I'd like to see the code of their tool, whether the thread pings involve modified cache lines. A synthetic scenario like modifying the same cache line with alternating threads might not match reality, where shared data is being modified less often and instead some locality is maintained.
 
Sep 6, 2007
64
0
81
So have we pinpointed the bottleneck of Ryzen processors in other loads than synthetic benchmarks (eg games)? Can Windows scheduler be programmed to not switch threads of the same program between CCXs but have one program use both CCXs? Or that needs modifications to the software itself?
 
Nov 27, 2016
1,395
0
96
You must be referring to the power testing hardware we used to elaborate on the RX480 power draw issues. Not only did AMD acknowledge the information we provided, they (mostly) fixed the issue. I wasn't joking, and neither were they.
I also busted out an o-scope to show differences between G-Sync and FreeSync, which not only educated folks, it likely pushed AMD to implement of Low Framerate Compensation.
Yes, I'm an electronics geek. I try to use my skill set to help the community by pushing manufacturers to improve their products.



Thanks. We still need to test on Intel quad core CPUs - they should have lower core-to-core latency as they aren't dealing with the larger ring bus. I used 5960X as the comparison point so we had matching core counts.

Allyn
So, if the windows schedulers isn't trying to have the two CCX's share information, then what is?
 

cytg111

Diamond Member
Mar 17, 2008
8,743
706
136
So have we pinpointed the bottleneck of Ryzen processors in other loads than synthetic benchmarks (eg games)? Can Windows scheduler be programmed to not switch threads of the same program between CCXs but have one program use both CCXs? Or that needs modifications to the software itself?
No that needs to b3 the scheduler... and its interresting that there isnt support for it yet.. they must have been working with microsoft on it for... years?

Maybe its a sheer resource issue.. AMD is not an infinite money tank and this is an issue that has been prioritized way down.. I am sure its coming.. at LEAST before naples.
 

lolfail9001

Golden Member
Sep 9, 2016
1,056
0
96
So, if the windows schedulers isn't trying to have the two CCX's share information, then what is?
Games that use more threads than Windows scheduler that prioritizes cores over pseudo-cores can afford without spilling into different CCX (so, 3 loaded threads).
Anyways, Allyn provided us that (4 threads running here, just in case):

So looks like Windows scheduler does throw threads around different CCXs (i personally got my indication of looncraz guess being wholeheartedly correct). Now the curious part is how does that affect MT scaling (it has to for looncraz guess to be entirely correct). Because if we are honest, outside of games, almost
every single damn test involved workloads that scale beyond 16 threads with ease. Or only single threaded workloads. Time to see what happens in between.
 
Oct 13, 2016
68
0
36
Why someone just doesn't compare how the linux kernel utilizes that CCX thing (i'm not an eeg) or the cpu at a whole with how windows kernel does, and spot the root cause?
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
0
136
citavia.blog.de
Why someone just doesn't compare how the linux kernel utilizes that CCX thing (i'm not an eeg) or the cpu at a whole with how windows kernel does, and spot the root cause?
Since Linux has dedicated code to detect CCXs (or last level caches here), Windows might have to do the same.
Code:
+ core_complex_id = (apicid & ((1 << c->x86_coreid_bits) - 1)) >> 3;
+ per_cpu(cpu_llc_id, cpu) = (socket_id << 3) | core_complex_id;
as posted on my blog a year ago: http://dresdenboy.blogspot.com/2016/02/amd-zeppelin-cpu-codename-confirmed-by.html
 


ASK THE COMMUNITY

TRENDING THREADS