Ryzen: Strictly technical

Page 58 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Borf

Junior Member
Mar 8, 2017
9
1
41
Of course not. There's no iGPU, so there's no display signal to send.

Of course I'm referring to the APUs.
I had incorrectly assumed that the HDMI on AM4 was limited to 1.4. For a HTPC this is not an option, but I now see some boards support HDMI 2.0.
 

Kromaatikse

Member
Mar 4, 2017
83
169
56
Given that this is a desktop platform, I would expect only dedicated display output ports to be used.

I honestly don't know how "alternate mode" outputs work in laptops. In all probability, they just multiplex an actual display output onto the same physical wires.
 

gtbtk

Junior Member
Mar 11, 2017
6
6
51
That would mean DF running at double memory clock would it not?
3200MT/s memory, AKA 1600MHz, would be 51.2GB/s right now. I don't see 6400MT/s memory coming anytime soon, so we have to be talking higher data fabric speeds.
Problem is that the frequency is 1/2 of actual memory frequency, in spite of what the OP stated in this post - please see LClk in the AMD produced slide below.

Using 3200Mhz RAM would give you DF bandwidth of 32Bytes * 800000 cycles = 25.6GBs is the maximum throughput that this design has to the memory, or the PCIe bus, or in between the CCX modules.

Those bottlenecks are the reason that the Gaming performance hits a ceiling with fast GPUs, particularly when you only have 2133 or 2400Mhz Ram installed. The IO hub expects to be getting 22.5GBs of when if you are using 2400Mhz Ram, there is only 19.5GBs available between the CPU and the IO hub, and that is assuming that the CPU/GPU is not actually trying to access memory, storage or swap threads at the same time it is also trying to send data to the GPU.

It also indicates that the Aida64 memory benchmarks are including the L1, L2 and L3 cache performance in what it claims is the memory performance benchmark results inflating the benchmark scores over what can actually be written to the RAM sticks. I can only assume that it is also doing that with Intel chips

If you increase the design to use 8 memory controllers (4 x the 2 that exist now), you get the 100GBs but each module is still going to be connected with 32Bytes per cycle interconnects unless there is a way to overclock the Data Fabric to run at a higher ratio of ram frequency.
IMG0053311_1.jpg
 
  • Like
Reactions: w3rd

JimmiG

Platinum Member
Feb 24, 2005
2,024
112
106
Have you though about trying a few consecutive cold reboots in a row..?

Yeah sometimes it will even cold boot on the first try too, but I don't want my computer to behave like an old lawnmower, sometimes requiring multiple attempts to start.
Maybe the new AGESA code will solve these kinds of problems, though it might just be an inherent problem with Ryzen's memory controller and higher speed RAM, requiring it to "warm up" before it accepts the highest RAM speeds/lowest latencies.
 

imported_jjj

Senior member
Feb 14, 2009
660
430
136
Problem is that the frequency is 1/2 of actual memory frequency, in spite of what the OP stated in this post - please see LClk in the AMD produced slide below.

Using 3200Mhz RAM would give you DF bandwidth of 32Bytes * 800000 cycles = 25.6GBs is the maximum throughput that this design has to the memory, or the PCIe bus, or in between the CCX modules.

Those bottlenecks are the reason that the Gaming performance hits a ceiling with fast GPUs, particularly when you only have 2133 or 2400Mhz Ram installed. The IO hub expects to be getting 22.5GBs of when if you are using 2400Mhz Ram, there is only 19.5GBs available between the CPU and the IO hub, and that is assuming that the CPU/GPU is not actually trying to access memory, storage or swap threads at the same time it is also trying to send data to the GPU.

It also indicates that the Aida64 memory benchmarks are including the L1, L2 and L3 cache performance in what it claims is the memory performance benchmark results inflating the benchmark scores over what can actually be written to the RAM sticks. I can only assume that it is also doing that with Intel chips

If you increase the design to use 8 memory controllers (4 x the 2 that exist now), you get the 100GBs but each module is still going to be connected with 32Bytes per cycle interconnects unless there is a way to overclock the Data Fabric to run at a higher ratio of ram frequency.
IMG0053311_1.jpg

Lclk has nothing to do with the data fabric clocks, as the slide shows you(lower right corner).
 

Borf

Junior Member
Mar 8, 2017
9
1
41
Given that this is a desktop platform, I would expect only dedicated display output ports to be used.

I honestly don't know how "alternate mode" outputs work in laptops. In all probability, they just multiplex an actual display output onto the same physical wires.

Yeah that would make sense. I thought that AMD might have done it from the SOC, but multiplexing off the chip would allow motherboard makers to add it as an optional feature.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
Problem is that the frequency is 1/2 of actual memory frequency, in spite of what the OP stated in this post - please see LClk in the AMD produced slide below.

Using 3200Mhz RAM would give you DF bandwidth of 32Bytes * 800000 cycles = 25.6GBs is the maximum throughput that this design has to the memory, or the PCIe bus, or in between the CCX modules.

Those bottlenecks are the reason that the Gaming performance hits a ceiling with fast GPUs, particularly when you only have 2133 or 2400Mhz Ram installed. The IO hub expects to be getting 22.5GBs of when if you are using 2400Mhz Ram, there is only 19.5GBs available between the CPU and the IO hub, and that is assuming that the CPU/GPU is not actually trying to access memory, storage or swap threads at the same time it is also trying to send data to the GPU.

It also indicates that the Aida64 memory benchmarks are including the L1, L2 and L3 cache performance in what it claims is the memory performance benchmark results inflating the benchmark scores over what can actually be written to the RAM sticks. I can only assume that it is also doing that with Intel chips

If you increase the design to use 8 memory controllers (4 x the 2 that exist now), you get the 100GBs but each module is still going to be connected with 32Bytes per cycle interconnects unless there is a way to overclock the Data Fabric to run at a higher ratio of ram frequency.
IMG0053311_1.jpg
Lclk is just the IO Hub Controller clock. The slide clearly shows that the data fabric does in fact run at memory clock.

51GB/s for 3200MT/s RAM.
 

TerionX6

Junior Member
Jun 29, 2015
14
20
46
Let's review some basics, there's no need to waste pages and posts on definition in a thread meant for technical discussion.

I think when people discuss the data fabric they forget, or forget to mention, that it is simultaneously bi-directional. Going by AMD's SoC bandwidth statements we must assume it is 32B/cycle in both directions. We can deduce this by seeing that the DF supports 32B/cycle, while the memory controller supports 16B/cycle for each memory channel, two per Zeppelin die. By necessity then the Zeppelin data fabric must be able to send and receive 16B/cycle*2 between the controller and CCXs.
Therefore:
DDR4-2400 = memClk@1200Mhz = DF@38.4GB/s*2 = 76.8GB/s total SoC bandwidth
DDR4-3600 = memClk@1800Mhz = DF@57.6GB/s*2 = 115.2GB/s total SoC bandwidth

I believe, with what we've seen of Ryzen, that bandwidth is much less important than how latency can be reduced by increasing memclk. Successfully running 4000MT/s RAM would give the chip a 40% reduction in latency, nearly closing the tested core-to-core latency gaps we've seen between Ryzen and Broadwell-E.

What also gets either ignored or misrepresented is that the data fabric clock IS the memory controller clock, nothing more and nothing less. There is no "half of RAM speed" for the DF. It simply is running at memclk, as we can see in AMD's clock domain slide.

Ciao,
Terion
 
Last edited:

KTE

Senior member
May 26, 2016
478
130
76
Many discussions tried to make sense of Ryzen's Tctl temperature readings. Here is my interpretation of how CPU Tctl temperature reading and offsets (plural) work on a *stock* Ryzen 1800X and how this affects stability.

- There are three (3) different *dynamic* offsets to Tctl, +0°C (aka base), +10°C and +20°C.

Not so. AMD posted the details a while back.

1700X/1800X have a +20 offset (Tctl) from real junction temp. -20C is the real temp.

You seem to be describing the chips heating per load instead.

Sent from my HTC 10 using Tapatalk
 

Timur Born

Senior member
Feb 14, 2016
277
139
116
Not so. AMD posted the details a while back.

1700X/1800X have a +20 offset (Tctl) from real junction temp. -20C is the real temp.
No, my CPU is not running at 10°C idle on an AIO cooler. But I am also playing with Sense Skew now. So this might solve the far too low readings.

You seem to be describing the chips heating per load instead.
No, I reproduced that this should not be heating per load. It's also hard to believe that heating always happens at exactly +10C and +20C intervals with every software I tested. That even happens when the CPU is pre-heated.

There seems to be a 95°C ceiling, but it can be broken by offset jump (like: 88C -> 98C and then it settles down to 95C again).
 
Last edited:

KTE

Senior member
May 26, 2016
478
130
76
No, my CPU is not running at 10°C idle on an AIO cooler. But I am also playing with Sense Skew now. So this might solve the far too low readings.


No, I reproduced that this should not be heating per load. It's also hard to believe that heating always happens at exactly +10C and +20C intervals with every software I tested. That even happens when the CPU is pre-heated.

There seems to be a 95°C ceiling, but it can be broken by offset jump (like: 88C -> 98C and then it settles down to 95C again).
I find what you're relating very strange... Broken sensors if that's true.

What you describe is not an engineering practise.

What do you idle and what are your ambients?

Sent from my HTC 10 using Tapatalk
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
No, my CPU is not running at 10°C idle on an AIO cooler. But I am also playing with Sense Skew now.

No, I reproduced that this not heating per load, unless you believe that heating always happens at exactly +10C and +20C intervals, even when the CPU is already pre-heated by said load.

I observe none of these temperature jumps with my 1700X. I can change that behavior, though, using the temperature compensation value in the BIOS on the C6H. If I use 63, the default, then I get actual temperatures. If I set 62, then I see temperatures 20C higher (default for AMD behavior), if I set 64 I seem to show 10C too cold... and does demonstrate some strange jumpiness in the values.

This is maintained pretty well:

Ryzen_Monitor.png


The "CPU Power" also seems to be fairly accurate, though is the full SoC power... and, if anything, overstates the power being used slightly.

Running Intel Burn Test "Very High" with 16 threads, 10 minutes, 1700X stock:

Ryzen_Monitor_Load.png


My "Intake Fans" and pump are driven my the "Radiator Intake" temperature, the "Exhaust Fans" and Rear Intake" are driven by CPU temperature (I would just drive it by Water Out temperature if I could...
 
  • Like
Reactions: Drazick

Timur Born

Senior member
Feb 14, 2016
277
139
116
I am using stock (Optimized Defaults) settings, so any Sense Skew settings are what Asus set them to. I now disabled this completely and the idle temps make more sense now, did not check load yet as I am too busy with too many things at the same time.

Furthermore I stumbled over another serious temp anomaly that I am not yet able to reproduce (aka switching between two states of how temps are handled, one of which can crash the CPU to Code 8 before any throttling happens).
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
Let's review some basics, there's no need to waste pages and posts on definition in a thread meant for technical discussion.

I think when people discuss the data fabric they forget, or forget to mention, that it is simultaneously bi-directional. Going by AMD's SoC bandwidth statements we must assume it is 32B/cycle in both directions. We can deduce this by seeing that the DF supports 32B/cycle, while the memory controller supports 16B/cycle for each memory channel, two per Zeppelin die. By necessity then the Zeppelin data fabric must be able to send and receive 16B/cycle*2 between the controller and CCXs.
Therefore:
DDR4-2400 = memClk@1200Mhz = DF@38.4GB/s*2 = 76.8GB/s total SoC bandwidth
DDR4-3600 = memClk@1800Mhz = DF@57.6GB/s*2 = 115.2GB/s total SoC bandwidth

I believe, with what we've seen of Ryzen, that bandwidth is much less important than how latency can be reduced by increasing memclk. Successfully running 4000MT/s RAM would give the chip a 40% reduction in latency, nearly closing the tested core-to-core latency gaps we've seen between Ryzen and Broadwell-E.

What also gets either ignored or misrepresented is that the data fabric clock IS the memory controller clock, nothing more and nothing less. There is no "half of RAM speed" for the DF. It simply is running at memclk, as we can see in AMD's clock domain slide.

Ciao,
Terion
I actually sent a message along to AMD to confirm if it's 32B/cycle in both directions when I wrote about the Ryzen clock domains.

They haven't responded :/
 
  • Like
Reactions: TerionX6

TerionX6

Junior Member
Jun 29, 2015
14
20
46
I actually sent a message along to AMD to confirm if it's 32B/cycle in both directions when I wrote about the Ryzen clock domains.

They haven't responded :/
I was thinking last week that if the controller or fabric have to wait before sending data a different direction, that might explain the crazy latency we see when a CCX crosstalks to another CCX's L3.
 

iBoMbY

Member
Nov 23, 2016
175
103
86
I actually sent a message along to AMD to confirm if it's 32B/cycle in both directions when I wrote about the Ryzen clock domains.

They haven't responded :/

It's pretty obvious, because if it wasn't 32B in every direction Ryzen would be limited to Single Channel DDR4 speed ...
 

Timur Born

Senior member
Feb 14, 2016
277
139
116
Some more information on Tctl offset and throttling (based on CH6 + 1800X):

- Offsets are not applied over 95C Tctl.
- If an offset shoots Tctl over 95C then the offset is dialed back quickly to match Tctl = 95C.
- As real CPU temperature keeps rising the offset decreases accordingly to match 95C.
- When real CPU temperature increases over 95C Tctl increases accordingly without offset.

- "Soft" throttling (down to around x30) is applied when Tctl + offset = 95C. The higher the real CPU temp increases towards 95C the more soft throttling is applied.
- "Hard" throttling (down to x0.5) is applied when real CPU temp hits 95C.

- Emergency temperature shutdown on my CH6 happens when SIO CPU temp hits 110C, Tctl can increase to slightly higher than SIO CPU over 95C. I saw 113C Tctl before shutdown hit.
 

KeMuケミュー

Junior Member
Apr 8, 2017
4
0
36
●Ryzen God_mode (Sleep revert boost bug)

How to : HTEP Disabled & Sleep revert ⇒ Ryzen boost bug mode (God mode)
Tread off: Error rate ?? (Steady state??)
performance Merit:  Skylake (intel) or over Skylake
BIOS: Before April 2017 (before 04/2017)
Power plan: performance
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
I finally have tested the latest AGESA (1.0.0.4) thoroughly... I am very glad I waited for this BIOS before posting my Zen architectural review.

SMT penalties are now nearly completely gone. I had previously seen a few areas with penalties as high as 15%... now there are just a few small single digit penalties.

Frequency scaling is now also as it should be - with only bandwidth-constrained benchmarks falling behind... and quite a few showing slightly better scaling with frequency as memory bandwidth is going up along with it as latency drops.

I've tested the Ryzen 5 1400 at 3Ghz, 3.4Ghz, and 3.8Ghz with and without SMT enabled using DDR4-2400 CL14 and I've tested the 1700X at those same frequencies with various CCX configurations. The latest microcode is a BIG help.

I can finally boot my system with DDR4-3200 speeds, though I have been unable to get it into Windows. I've still found that DDR4-2667 CL14 brings the best overall stability for me... but 2933+ will be quite nice to get working fully.

I've also noticed that L1 and L2 bandwidth results are somehow impacted by memory clocks... it's very confusing that DDR4-2933 CL16 results in > 1100GB/s at 3.9Ghz when just dropping to DDR4-2667 CL14 results in 966GB/s of L1 bandwidth... according to AIDA.

You can browse my various raw results here.

I should have my review online within 48 hours. Almost thought I was going to manage it tonight, but I have to create new charts and some of the conclusions have changed, so I have to ponder the implications before a rewrite.
 

imported_jjj

Senior member
Feb 14, 2009
660
430
136
I've also noticed that L1 and L2 bandwidth results are somehow impacted by memory clocks... it's very confusing that DDR4-2933 CL16 results in > 1100GB/s at 3.9Ghz when just dropping to DDR4-2667 CL14 results in 966GB/s of L1 bandwidth... according to AIDA.
.

At 2933 you are getting higher memory BW than theoretical so you can't count on those results.
Is HPET disabled?
 
Last edited:

Timur Born

Senior member
Feb 14, 2016
277
139
116
According to Elmor temperature shutdown is not handled by SIO CPU temp, but by the CPU based on Tctl. If this is the case then the shutdown temperature seems to be 115C Tctl.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
At 2933 you are getting higher memory BW than theoretical so you can't count on those results.
Is HPET disabled?
Seems like it's just getting 100% of theoretical rather than more than theoretical?

I say "just", but that's already impressive as all hell.
 
  • Like
Reactions: looncraz

mtcn77

Member
Feb 25, 2017
105
22
91
I haven't noticed if it had been mentioned - what do you make of the quad single-rank Hardware.fr benchmarks? This is their summary of 4x4GB original benchmarks(blue) in comparison to faster 2x8GB results... pretty fast for 2400CL15 DIMMs:
We then noticed several things going from 4 x 4 GB to 2 x 8 GB:

The Command Rate changes from 2T to 1T (it is not adjustable at the motherboard level)
Reading bandwidth under Aida64 increases by one GB / s
The performance in applications where the memory subsystem is limited decreases by about 10%. This is the case for example under 7-Zip and WinRAR
http://www.hardware.fr/articles/956-4/ryzen-7-1800x-gamme-ddr4.html
http://www.hardware.fr/articles/958-8/choix-complexifie-par-support-complique.html
getgraphimg.php

What am I missing, Captain?
 

imported_jjj

Senior member
Feb 14, 2009
660
430
136
Seems like it's just getting 100% of theoretical rather than more than theoretical?

I say "just", but that's already impressive as all hell.

Theoretical is just under 47GB/s and his result with the cores at 3.9GHz gets to almost 49GB/s.
With Ryzen he should be getting 44-45GB/s with the DRAM at 2933.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136
At 2933 you are getting higher memory BW than theoretical so you can't count on those results.
Is HPET disabled?

HPET is enabled, absolutely. I will have to retest, though, as I did not realize it at the time - so thanks for catching that :p

I think the memory could have been running at 3200, despite what the screenshot says. Others on the forums have stated seeing similar behavior with the 0083 BIOS. I ran a couple other simple benchmarks and they were entirely consistent (2300 in CPU-z ST, for example), but I pushed overclocking too far with Ryzen Master and had to clear the CMOS.
 
  • Like
Reactions: Drazick
Status
Not open for further replies.