Ryzen: Strictly technical

Timur Born · Apr 1, 2017

The Stilt said:
(on another forum) Most likely you are hitting the TDC limit. That's the most common reason why Ryzen throttles below the MACXFRC.

So I increased VCore, set CPU Current limit to 140% (doesn't seem to matter) and did several tests using HWinfo at 100 ms refresh interval. Currents listed are read from Asus EC sensor block, not HWinfo's CPU Core Current (SVI2 TFN), which reads 4-5 A higher values. Tctl is well below 70°C with fans running slow (pump is unnerving) in an open chassis.

Results (likely affected by some background tasks):

- Prime95 small FFT does 107 A max and throttles the CPU down to x36 = CPU spec without XFR.
- Prime95 beta small FFT does 104 A max and does not throttle the CPU.

- X264 does 101 A max and does not throttle.
- Y-Cruncher (HNT) does 107 A max and does not throttle.
- Realbench does 98 A max and does not throttle.
- Statuscore does 97 A max and does not throttle.

- IDT/Linpack AVX does 117 A max / average 113 A and does not throttle!

- IDT/Linpack non AVX...

14384 (max) does 110 A max / average 108 A and throttles down to x35.
4096 does 110 A max / average 104 A and throttles down to x35.0.
1024 does 110 A max / average 99 A and throttles down to 34.8.

20 does 109 A max / average ~87.5 A and throttles down to x35.0.
19 does 109 A max / average ~87 A and throttles down to x36.3.
18 does 105 A max / average ~86 A and very occasionally throttles down single cores to x35.8.

1-17 does up to 105 A max / average ~86 A and does not throttle down. This also is the range where memory (controller) related errors don't seem to appear, so it may be a cache size thing.

- Heavyload does 108 A max / average 106 A and throttles down to x33.5! It's private memory footprint is 18 mb.

Any idea?

The Stilt · Apr 1, 2017

Timur Born said:
So I increased VCore, set CPU Current limit to 140% (doesn't seem to matter) and did several tests using HWinfo at 100 ms refresh interval. Currents listed are read from Asus EC sensor block, not HWinfo's CPU Core Current (SVI2 TFN), which reads 4-5 A higher values. Tctl is well below 70°C with fans running slow (pump is unnerving) in an open chassis.

Results (likely affected by some background tasks):

- Prime95 small FFT does 107 A max and throttles the CPU down to x36 = CPU spec without XFR.
- Prime95 beta small FFT does 104 A max and does not throttle the CPU.

- X264 does 101 A max and does not throttle.
- Y-Cruncher (HNT) does 107 A max and does not throttle.
- Realbench does 98 A max and does not throttle.
- Statuscore does 97 A max and does not throttle.

- IDT/Linpack AVX does 117 A max / average 113 A and does not throttle!

- IDT/Linpack non AVX...

14384 (max) does 110 A max / average 108 A and throttles down to x35.
4096 does 110 A max / average 104 A and throttles down to x35.0.
1024 does 110 A max / average 99 A and throttles down to 34.8.

20 does 109 A max / average ~87.5 A and throttles down to x35.0.
19 does 109 A max / average ~87 A and throttles down to x36.3.
18 does 105 A max / average ~86 A and very occasionally throttles down single cores to x35.8.

1-17 does up to 105 A max / average ~86 A and does not throttle down. This also is the range where memory (controller) related errors don't seem to appear, so it may be a cache size thing.

- Heavyload does 108 A max / average 106 A and throttles down to x33.5! It's private memory footprint is 18 mb.

Any idea?

"CPU Current Limit" option controls the per phase OCP limit of the VRM.
This has nothing to do with the limits the power management (SMU) has.
Besides, the default (100%) limit is 26A per phase (i.e. 208A on 8-phase C6H) which is "quite" sufficient for any Ryzen CPU.

Timur Born · Apr 1, 2017

Thanks for the information. I wasn't sure, because the name-scheme is somewhat ambiguous, I am coming from Intel and usually don't care much for the intricacies of these things. I will play with the various VRM related settings later, because the buzzing noise at low load is really very bad (especially when Vcore drops well below 1 v). I haven't even checked load related noise on ground lines yet, but expect it to be rather ugly.

My main focus is on compatibility and stability testing, but dissecting a new platform on the very frontier is fun, too. And since I am often CPU limited it's not a bad thing to aim for maximum possible performance.

Unfortunately PCIe compatibility leaves a lot to be desired and I suspect that getting AMD/mainboard manufacturers to improve on the situation will be somewhat of a fight against windmills. So I concentrate on contributing to what the scene is currently most interested in in order to hopefully get some help with the PCIe issues.

What do you make of the current vs. throttling numbers coming out of different software/load profiles?

keymaster151 · Apr 1, 2017

Does anyone know if the AGESA in the new Asrock 2.00 BIOS is the one that's supposed to reduce your memory latency by 6ns? It only mentions enhanced Ryzen 5 support, but those patch notes are hardly comprehensive. Has anyone run any tests with it?

http://www.asrock.com/mb/AMD/X370 Killer SLI/index.us.asp#BIOS

IEC · Apr 1, 2017

keymaster151 said:
Does anyone know if the AGESA in the new Asrock 2.00 BIOS is the one that's supposed to reduce your memory latency by 6ns? It only mentions enhanced Ryzen 5 support, but those patch notes are hardly comprehensive. Has anyone run any tests with it?

http://www.asrock.com/mb/AMD/X370 Killer SLI/index.us.asp#BIOS

Unlikely to be the new 1.0.0.4 AGESA, because it was "nearing release" yesterday. I'd expect mid-April for the first non-beta BIOSes using the new AGESA.

innociv · Apr 1, 2017

Lets get back to strictly technical:
Has anyone tried running a common actual workstation application that is open source through another compiler?

As far as I'm aware, ICC will never compile to take advantage of how AMD CPUs have, since forever, handled 4 complex instructions per cycle compared to the 1 complex + 3 simple of Intel CPUs.
The one example I found, which was software emulated openGL, which did compile to take advantage of that, shows a 45% IPC increase with Ryzen over Kabylake.

Since compiled binaries are generally feeding Ryzen 1 complex + 3 simple even when they could do 4 complex, it is being heavily underutilized.

Kromaatikse · Apr 1, 2017

innociv said:
AMD CPUs have, since forever, handled 4 complex instructions per cycle

This is not quite true, though it might depend on what you mean by "complex".

K7, K8, K10 could decode 3 instructions per cycle into macro-ops which could be up to two micro-ops (generally AGU + ALU) each. Each of the three integer pipelines could handle a whole macro-op in one cycle. Some instructions required two macro-ops and were processed by two decoders, with a throughput of three such instructions per two cycles, or one "double" instruction plus one "single" instruction per cycle. This led to a pretty consistent throughput of six micro-ops per cycle, if most instructions had memory operands.

Bulldozer family could decode up to four instructions per cycle into up to four macro-ops per cycle, similarly to the above except that a "double" instruction could no longer be divided between two cycles of decoding. Lack of execution units (among other things) meant that the theoretical eight micro-ops per cycle throughput implied by this could not be realised in practice.

Ryzen is apparently capable of decoding four instructions into up to *eight* macro-ops per cycle, though it can only dispatch six. This gives it a burst capability to fill the front-end queues after a fetch bubble, I suppose. But this is clearly a greater capability than the previous CPUs, however you look at it.

All the above ignores the rare instructions that require vector-coding, using the microcode ROM and the entire decoder array in parallel.

innociv said:
compared to the 1 complex + 3 simple of Intel CPUs

Meanwhile, Intel has apparently taken notice of AMD's general use of macro-ops, and has been increasing the use of "fused micro-ops" which are broadly similar in character. Can you provide a concrete example of an ICC code sequence that shows this pattern?

looncraz · Apr 1, 2017

Timur Born said:
Unfortunately PCIe compatibility leaves a lot to be desired and I suspect that getting AMD/mainboard manufacturers to improve on the situation will be somewhat of a fight against windmills.

In what way? AMD directly licenses PCI-e from Intel, IIRC.

Timur Born · Apr 2, 2017

Running the 1800X at 4.0 GHz using "Auto" voltage = 1.45-1.46 Vcore idle:

Heavyload does 124 A max and does not throttle down.
IDT/Linepack non AVX does 120 A max and does not throttle down.

So why does the CPU throttle down at stock frequency under certain loads?

Timur Born · Apr 2, 2017

looncraz said:
In what way? AMD directly licenses PCI-e from Intel, IIRC.

The chipset connected PCIe slots (4_3, 1_1, 1_2, 1_3 on the CH6) don't detect all PCIe card, while the CPU connected ones seem to work. I can reproduce this on my own CH6 and know from a customer that the Asus X370 Pro and Asrock Killer show the same problem.

From experience with similar problems with Intel based boards I know that this hopefully should be fixable via BIOS update. I once had a HP Q170 based PC here where HP fixed the BIOS so that at least the CPU connected PCIe slots worked, they didn't fix the chipset ones, though.

One of the reasons for me to buy the Ryzen rig was to test compatibility with RME Audio professional audio equipment (PCIe, FW and USB based). These interfaces/cards use a Xilinx Spartan and implement their own protocols for connections, they worked in thousands of PCs world-wide for years.

Out of three different cards (Madi FX, AIO and Madi) one works in all slots, one works in all slots except slots 1_3 and one only works in the CPU connected slots (only tested 16/8_2, because the other slot is housing the GPU). So if I wanted to run 2x RME HDSPe Madi cards I currently cannot do it, because there is only one working free PCIe slot available.

I also tested a Sonnet SATA (Marvell) and two Pericom based PCIe-PCI bridges which all worked. So to be fair these interface can drive PCIe to its limits and there always is a chance of an error on the interface's side. But a lot points towards the X370 (or BIOS) here.

Timur Born · Apr 2, 2017

I did another run at 4.0 Ghz using Auto Vcore (1.45-1.46 V), no throttling happened at this OC settings:

ITB AVX does 127 A max, package power max 170 W, CPU + SOC max 198 W.
ITB *non* AVX does 122 A max, package power max 166 W, CPU + SOC max 190 W.
Heavyload does 125 A max, package power max 185 W, CPU + SOC max 190 W.

Anyone seeing a reasonable pattern here, why the latter two would cause throttling at stock frequency?

looncraz · Apr 2, 2017

Timur Born said:
I did another run at 4.0 Ghz using Auto Vcore (1.45-1.46 V):

ITB AVX does 127 A max, package power max 170 W, CPU + SOC 198 W.
ITB *non* AVX does 122 A max, package power max 166 W, CPU + SOC 190 W.
Heavyload does 125 A max, package power max 185 W, CPU + SOC 190 W.

Anyone seeing a reasonable pattern here, why the latter two would cause throttling at stock frequency?

If throttling is per-core (likely) then a single core may be exceeding its max TDP.

At stock, though, I imagine there's simply a time variable involved to keep the 95W TDP rating.. once you overclock TDP based throttling is nullified.

looncraz · Apr 2, 2017

Timur Born said:
The chipset connected PCIe slots (4_3, 1_1, 1_2, 1_3 on the CH6) don't detect all PCIe card, while the CPU connected ones seem to work. I can reproduce this on my own CH6 and know from a customer that the Asus X370 Pro and Asrock Killer show the same problem.

From experience with similar problems with Intel based boards I know that this hopefully should be fixable via BIOS update. I once had a HP Q170 based PC here where HP fixed the BIOS so that at least the CPU connected PCIe slots worked, they didn't fix the chipset ones, though.

One of the reasons for me to buy the Ryzen rig was to test compatibility with RME Audio professional audio equipment (PCIe, FW and USB based). These interfaces/cards use a Xilinx Spartan and implement their own protocols for connections, they worked in thousands of PCs world-wide for years.

Out of three different cards (Madi FX, AIO and Madi) one works in all slots, one works in all slots except slots 1_3 and one only works in the CPU connected slots (only tested 16/8_2, because the other slot is housing the GPU). So if I wanted to run 2x RME HDSPe Madi cards I currently cannot do it, because there is only one working free PCIe slot available.

I also tested a Sonnet SATA (Marvell) and two Pericom based PCIe-PCI bridges which all worked. So to be fair these interface can drive PCIe to its limits and there always is a chance of an error on the interface's side. But a lot points towards the X370 (or BIOS) here.

Very interesting, thanks for following up on that.

I wonder why AMD didn't just use the same PCI-e 3.0 IP for the chipset that they did on the CPU. Would have looked better to have more PCI-e 3.0 lanes as well, even if they shared bandwidth of just four lanes (which is basically what Intel does).

looncraz · Apr 2, 2017

Okay, I've been confused by an issue I've seen with my Windows installs and Ryzen for some time, but never really sought to resolve it until tonight.

It seems that FRAPS brutally harms the CPU-z multi-threaded score. As in I see 1.7~4x scaling instead of >8X.

I use FRAPS a great deal with some of my games, so I just have it launch with Windows. I will need to explore if this has impacted any other results at all, but I don't think it extends much beyond CPU-z. Closing FRAPS, for example, doesn't cure the affinity mask bug.

tamz_msc · Apr 2, 2017

looncraz said:
Okay, I've been confused by an issue I've seen with my Windows installs and Ryzen for some time, but never really sought to resolve it until tonight.

It seems that FRAPS brutally harms the CPU-z multi-threaded score. As in I see 1.7~4x scaling instead of >8X.

I use FRAPS a great deal with some of my games, so I just have it launch with Windows. I will need to explore if this has impacted any other results at all, but I don't think it extends much beyond CPU-z. Closing FRAPS, for example, doesn't cure the affinity mask bug.

I can confirm this on my end: on a Core 2 Duo E6300 it went from 1.86x to 1.57x on the MT ratio.

Timur Born · Apr 2, 2017

looncraz said:
If throttling is per-core (likely) then a single core may be exceeding its max TDP.

At stock, though, I imagine there's simply a time variable involved to keep the 95W TDP rating.. once you overclock TDP based throttling is nullified.

What is the TDP per single core on a 1800X? Given the rather sane power/wattage levels per core and overall cool temperatures of the CPU is this really the most plausible explanation?

formulav8 · Apr 2, 2017

tamz_msc said:
I can confirm this on my end: on a Core 2 Duo E6300 it went from 1.86x to 1.57x on the MT ratio.

I can confirm without Fraps running it being 3.75x-3.77x and with Fraps dropping to 1.42x-1.44x on an 860k. Ouch

Pantsoftime · Apr 2, 2017

Timur Born said:
The chipset connected PCIe slots (4_3, 1_1, 1_2, 1_3 on the CH6) don't detect all PCIe card, while the CPU connected ones seem to work. I can reproduce this on my own CH6 and know from a customer that the Asus X370 Pro and Asrock Killer show the same problem.

One of the reasons for me to buy the Ryzen rig was to test compatibility with RME Audio professional audio equipment (PCIe, FW and USB based). These interfaces/cards use a Xilinx Spartan and implement their own protocols for connections, they worked in thousands of PCs world-wide for years.

Out of three different cards (Madi FX, AIO and Madi) one works in all slots, one works in all slots except slots 1_3 and one only works in the CPU connected slots (only tested 16/8_2, because the other slot is housing the GPU). So if I wanted to run 2x RME HDSPe Madi cards I currently cannot do it, because there is only one working free PCIe slot available.

This is interesting data. Sometimes the issues with FPGA-based cards are signal integrity, link training tolerance, or even the startup time (FPGAs often take more time to load than the PCIe spec allows for). I wonder if any current BIOS adjustments could affect the signaling from the SB to the cards. Have you tried lowering the PCIe ref clock or boosting the SB voltage? Have you tried forcing PCIe Gen1 for these cards? Do they work if you power up the board and then hit the reset button after POST?

KTE · Apr 2, 2017

looncraz said:
If throttling is per-core (likely) then a single core may be exceeding its max TDP.

At stock, though, I imagine there's simply a time variable involved to keep the 95W TDP rating.. once you overclock TDP based throttling is nullified.

Throttling won't be TDP based, they've done away with that now. It'll be based on the hotspots and Idd.

Sent from HTC 10
(Opinions are own)

Timur Born · Apr 2, 2017

Pantsoftime said:
This is interesting data. Sometimes the issues with FPGA-based cards are signal integrity, link training tolerance, or even the startup time (FPGAs often take more time to load than the PCIe spec allows for).

While I can follow this I have to underline again that the CPU connected PCIe slot works, as do thousands of other PCs. It's also odd that the RME HDSPe AIO works in all slots but 1_3. Last but not least, the RME Madi FX that works in all slots also uses a FPGA (it's a more recent implementation of hardware and firmware, though).

I wonder if any current BIOS adjustments could affect the signaling from the SB to the cards. Have you tried lowering the PCIe ref clock or boosting the SB voltage? Have you tried forcing PCIe Gen1 for these cards? Do they work if you power up the board and then hit the reset button after POST?

I only tried forcing PCIe Gen1 and forcing x1 mode on 4_3, which shares bandwidth with all the other PCIe x1 slots (and various onboard devices according to HWinfo). PCIe ref clock at Auto setting runs at 99.8 - 102 MHz, I will try to lower it and also try to increase SB voltage. I tried various combinations of power-up, including Clear CMOS, but will try again.

Thanks for those suggestions!

Timur Born · Apr 2, 2017

KTE said:
Throttling won't be TDP based, they've done away with that now. It'll be based on the hotspots and Idd.

You mean as in temperature? Is it even likely to have throttle inducing hotspots while Tctl readings stay below 60°C? And wouldn't IBT/Linpack AVX produce such hotspots, too?

KTE · Apr 2, 2017

Timur Born said:
You mean as in temperature? Is it even likely to have throttle inducing hotspots while Tctl readings stay below 60°C? And wouldn't IBT/Linpack AVX produce such hotspots, too?

Idd is current and hotspots, yep.

Some of the sensors are placed around what AMD would seem critical junctions. If a hotspot occurs there, the chip will throttle for sure. 30C delta between a hotspot and another chip section is not uncommon.

Hotspots are primarily workload and structure dependent.

I'm not alluding to this being definitive, but it's a potential possibility.

Sent from HTC 10
(Opinions are own)

looncraz · Apr 2, 2017

KTE said:
Throttling won't be TDP based, they've done away with that now. It'll be based on the hotspots and Idd.

cTDP is still supported, just not actively exposed.

My understanding is that power, local temperatures, circuit instability, Vdroop, and more is considered when controlling the core frequencies. I don't see any of that being an issue on stock - except power.

A good way to test would be using negative offset voltage and seeing if throttling stops (though this could also be the local temperatures dropping...).

kostarum · Apr 2, 2017

Any news for DDR4 3666 or faster, on Ryzen?

Gesendet von meinem GT-I9295 mit Tapatalk

Timur Born · Apr 2, 2017

looncraz said:
cTDP is still supported, just not actively exposed.

My understanding is that power, local temperatures, circuit instability, Vdroop, and more is considered when controlling the core frequencies. I don't see any of that being an issue on stock - except power.

A good way to test would be using negative offset voltage and seeing if throttling stops (though this could also be the local temperatures dropping...).

I updated BIOS from 1001 to 1002 (not convinced yet), did another clear CMOS, deactivated XFR (tried that before), cranked all fans and pump to 100% and used a Vcore offset of -0.1 V (-0.15 V crashed to Code 8 quickly).

Same results, both Heavyload and ITB/Linpack *non* AVX throttle down. And even though throttling happens on cores individually the end result still is that all cores are throttled concurrently, with Heavyload they drop to x34 mostly. I did not see my former x33 minimum yet, but x33.8. So the effort seems to help some, although only very little.

Tctl max doesn't hit 50°C with fans/pump at 100%, more like 41°C. CH6 CPU temp reading (some kind of average of sensors + special formula) hits 50°C max.

Ryzen: Strictly technical

Senior member

Golden Member

Senior member

Junior Member

Elite Member

Member

Member

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Junior Member

Senior member

Senior member

Senior member

Senior member

Senior member

Junior Member

Senior member