Ryzen: Strictly technical

The Stilt · Apr 3, 2017

looncraz said:
A good way to test would be using negative offset voltage and seeing if throttling stops (though this could also be the local temperatures dropping...).

The SMU calculates the power consumption based on the commanded voltage and external voltage offsets cannot be seen by the SMU.
Obviously the actual power draw and temperatures will be lower, however the SMU won't be noticing any difference.

The voltage reduction must be commanded by the SMU, however at least for the time being such adjustments are not possible.

Atari2600 · Apr 3, 2017

innociv said:
Lets get back to strictly technical:
Has anyone tried running a common actual workstation application that is open source through another compiler?

As far as I'm aware, ICC will never compile to take advantage of how AMD CPUs have, since forever, handled 4 complex instructions per cycle compared to the 1 complex + 3 simple of Intel CPUs.
The one example I found, which was software emulated openGL, which did compile to take advantage of that, shows a 45% IPC increase with Ryzen over Kabylake.

Since compiled binaries are generally feeding Ryzen 1 complex + 3 simple even when they could do 4 complex, it is being heavily underutilized.

Really?

Might be worth doing a compile of openfoam down the line to see how it runs.

keymaster151 · Apr 3, 2017

New beta BIOSes from Gigabyte: http://forum.gigabyte.us/thread/886/am4-beta-bios-thread

It seems that at least some of them use the new 1.0.0.4 AGESA.

looncraz · Apr 3, 2017

The Stilt said:
The SMU calculates the power consumption based on the commanded voltage and external voltage offsets cannot be seen by the SMU.
Obviously the actual power draw and temperatures will be lower, however the SMU won't be noticing any difference.

The voltage reduction must be commanded by the SMU, however at least for the time being such adjustments are not possible.

If it's calculating it with the command voltage, then there's no hope. It seems like a pretty dumb way to measure power as well, considering the SMU must also be aware of the real voltage supplied to each core to best adapt to real voltage. Otherwise AMD may as well have just stuck with the old tables.

looncraz · Apr 3, 2017

So, I got my wife's Ryzen 5 1400 in today. It *IS* a 2+2 configuration, with 4MB L3 enabled per CCX.

Interestingly, the cores are interleaved by Windows.

Core 0 & Core 2 are on CCX 0
Core 1 & Core 3 are on CCX 1

CCX latency penalty is still showing as 20ns, have yet to profile cache bandwidth.

The ASRock Fatalwhocares? AB350 gives me 3+3 and 4+0 down-core control options with BIOS 2.20. I'm a bit too nervous to tempt fate and see if any cores can be unlocked, though it could only go so far as a six core unlock.

EDIT:

I tempted fate by trying to use the 3+0 option - still four cores. Tried 4 + 0, still 2+2, tried 3+3, still 2+2... I guess that option is disabled. Haven't tried to make it a dual core, but it appears that core unlocking is not an option.

PhonakV30 · Apr 4, 2017

looncraz said:
So, I got my wife's Ryzen 5 1400 in today. It *IS* a 2+2 configuration, with 4MB L3 enabled per CCX.

Interestingly, the cores are interleaved by Windows.

Core 0 & Core 2 are on CCX 0
Core 1 & Core 3 are on CCX 1

CCX latency penalty is still showing as 20ns, have yet to profile cache bandwidth.

The ASRock Fatalwhocares? AB350 gives me 3+3 and 4+0 down-core control options with BIOS 2.20. I'm a bit too nervous to tempt fate and see if any cores can be unlocked, though it could only go so far as a six core unlock.

How is Power consumption ?

Timur Born · Apr 4, 2017

I am currently testing SOC voltage. For that I set everything to defaults, except for running the RAM at 3200-14-14-14-34-1T. Then I lower Vsoc to 0.8 V, check booting and stability with all three of my RAM kits (Flare X, Ripjaws V, TridentZ). When a test fails I increase Vsoc step by step.

Low SOC voltage leads to:

- Code 8.
- F9 Post loop.
- 0D Post freeze.
- Reboot without BSOD or Post code.

Even a *single* SOC voltage step makes or breaks it. I can get all of the above at 0.91 V and then get (seemingly) completely stable at 0.92 V.

The TrizdentZ boot at slightly lower (0.91) Vsoc than the other two RAM kits (0.92 V). This is a bit surprising, because when I tried 3600-25-18-18-18 earlier the Ripjaws booted at settings that the other two did not boot with. None were stable even at Vsoc 1.25 V, though.

I will try to test if temperature/cooling has an effect of SOC stability. Earlier I tried 4.0 GHz + 3200-14-14-14-39 with full fans/pump and lowest fans/pump and got Code 8 at low fan settings, but not at high fan settings. This happened while reported Tctl was around 70-75°C, so not exactly at CPU melting temps.

The Stilt · Apr 4, 2017

looncraz said:
If it's calculating it with the command voltage, then there's no hope. It seems like a pretty dumb way to measure power as well, considering the SMU must also be aware of the real voltage supplied to each core to best adapt to real voltage. Otherwise AMD may as well have just stuck with the old tables.

There is no technical reason why the voltages could not be reprogrammed.
The SMU is aware of the real voltage levels, however they don't affect it's decisions. The part is originally designed to use dLDOs, remember

Timur Born · Apr 4, 2017

Pantsoftime said:
This is interesting data. Sometimes the issues with FPGA-based cards are signal integrity, link training tolerance, or even the startup time (FPGAs often take more time to load than the PCIe spec allows for). I wonder if any current BIOS adjustments could affect the signaling from the SB to the cards. Have you tried lowering the PCIe ref clock or boosting the SB voltage? Have you tried forcing PCIe Gen1 for these cards? Do they work if you power up the board and then hit the reset button after POST?

I tried everything in the list and some of the Advanced options, too. Made no difference.

At first I thought SB voltage helped, but it turned out that I did not test both Madi cards properly and one of the two works in all PCIe slots, while the other only works in the CPU slots. I know that at least one of the two is not fully functional, so that can be the reason. Still the CPU slots seem to work better and there also is the odd case of only PCIe 1_3 not working with the AIO card.

looncraz · Apr 4, 2017

The Stilt said:
There is no technical reason why the voltages could not be reprogrammed.
The SMU is aware of the real voltage levels, however they don't affect it's decisions. The part is originally designed to use dLDOs, remember

I've seen that the dLDOs are, in fact, active... and I've seen the opposite. Both claimed by people who should know.

I think they're active, personally, but I have no idea how to test that.

CatMerc · Apr 4, 2017

looncraz said:
I've seen that the dLDOs are, in fact, active... and I've seen the opposite. Both claimed by people who should know.

I think they're active, personally, but I have no idea how to test that.

Sigh

I'm such a child.

The Stilt · Apr 5, 2017

looncraz said:
I've seen that the dLDOs are, in fact, active... and I've seen the opposite. Both claimed by people who should know.

I think they're active, personally, but I have no idea how to test that.

They're only active for the main planes (VDDCR_CPU & VDDCR_SoC) if you plug in a A0c silicon and a compatible bios version.
For B1 silicon and AGESA versions targeting them, they are only active for the minor planes while the main planes are in a permanent by-pass mode.

If you've ever tested a A0c Zeppelin, you'll know that you can easily see if the main plane dLDOs are active or not even from the power drawn from the wall figure.

KTE · Apr 5, 2017

I don't think anyone expected a very good power sensory feedback implementation. It's AMDs first/second proper attempt while Intel has perfected these since Nehalem. I did keep warning the unreasonable wishers that it's most likely a dumb implementation as the chip is being rushed to market with a lot disabled/fused off/buggy. But that gives hope for the next 'fixing'.

There is no low level power detection described anywhere. Increasing thermal sense distribution is via engineering only for hot-spot detection and mitigation. Theres no description or accuracy of how power/temps are collected, aggregated or averaged. No technical papers released. In even a half-baked implementation, you can measure the power/current figures off certain pins and in specific registers.

Once again, cTDP. What is the formula used?

It's a random thermal cooling figure, highly ambiguous for Ryzen. As Ryzen/Broadwell TDP is completely different to Power. Dumb configs just take BIOS set Core voltage and multiply by Idd pin current which is retrieved from a register. This figure will be nothing close to power.

Is Ryzen able to throttle certain units?

Does ThrottleStop work with Ryzen?

Sent from my HTC 10 using Tapatalk

bjt2 · Apr 5, 2017

The Stilt said:
They're only active for the main planes (VDDCR_CPU & VDDCR_SoC) if you plug in a A0c silicon and a compatible bios version.
For B1 silicon and AGESA versions targeting them, they are only active for the minor planes while the main planes are in a permanent by-pass mode.

If you've ever tested a A0c Zeppelin, you'll know that you can easily see if the main plane dLDOs are active or not even from the power drawn from the wall figure.

So there is a bug in the retail silicon that forced AMD to disable dLDO on ryzen?

JimmiG · Apr 5, 2017

Timur Born said:
I am currently testing SOC voltage. For that I set everything to defaults, except for running the RAM at 3200-14-14-14-34-1T. Then I lower Vsoc to 0.8 V, check booting and stability with all three of my RAM kits (Flare X, Ripjaws V, TridentZ). When a test fails I increase Vsoc step by step.

Low SOC voltage leads to:

- Code 8.
- F9 Post loop.
- 0D Post freeze.
- Reboot without BSOD or Post code.

Even a *single* SOC voltage step makes or breaks it. I can get all of the above at 0.91 V and then get (seemingly) completely stable at 0.92 V.

The TrizdentZ boot at slightly lower (0.91) Vsoc than the other two RAM kits (0.92 V). This is a bit surprising, because when I tried 3600-25-18-18-18 earlier the Ripjaws booted at settings that the other two did not boot with. None were stable even at Vsoc 1.25 V, though.

I will try to test if temperature/cooling has an effect of SOC stability. Earlier I tried 4.0 GHz + 3200-14-14-14-39 with full fans/pump and lowest fans/pump and got Code 8 at low fan settings, but not at high fan settings. This happened while reported Tctl was around 70-75°C, so not exactly at CPU melting temps.

Same here, it's extremely sensitive to SOC voltage. With stock voltage, it's unstable with the RAM at 3200 CL16. At +0.0125V, it's nearly stable, and at +0.025V it's stable.

Sadly, I can't for the life of me get 3200 at CL14 to work. No amount of tweaking the voltages allows the system to cold boot. Warm boot is fine, but cold boots always lock up the first time. After it has POSTed at 3200 CL14 once, it's stable until I unplug and reconnect the system, then it fails to POST again.
I wonder what's causing these cold boot problems. They seem to be pretty common with higher RAM speeds.

Borf · Apr 5, 2017

Looking at a couple of manuals, I'm going to take it as a no, but if someone could correct me it would be great. None of the USB ports coming off Ryzen or the chipsets support USB alternate mode display port?

Timur Born · Apr 5, 2017

Many discussions tried to make sense of Ryzen's Tctl temperature readings. Here is my interpretation of how CPU Tctl temperature reading and offsets (plural) work on a *stock* Ryzen 1800X and how this affects stability.

- There are three (3) different *dynamic* offsets to Tctl, +0°C (aka base), +10°C and +20°C.

- Offsets likely get chosen on the basis of CPU instructions usage! They get not seem to be based on power draw/current/voltage, CPU load/core percentage or temperature.

This means that a program like Heavyload will only induce the +10°C offset even when it draws the very same power and loads all cores at 100% as programs like P95, ITB/Linpack (AVX and non AVX) or Realbench induce the +20°C offset.

The software Statuscore demonstrates this easily, as it seems to use different load/instruction set for stress testing odd cores vs. even cores. When it stressed any number of even cores it induces the +10°C offset, when it stresses any number of odd cores it induces the +20°C offset.

- Offsets usually increase in immediate jumps, but decrease gradually. Sometimes it may not seem that way, because Tctl is right in between two offsets (0/10/20). This behavior is what leads to fans spinning up and down, especially when the "High Performance" Windows profile is used while some load is present. That the power profile has some minor impact suggests that there is some additional mechanism add work.

- Low CPU temperature with certain CPU instruction sets seems to be *vital* for stability even far below any thermal throttling/shutdown point!

I can repeatedly crash my CH6 into Code 8 (CPU) by increasing temperature towards 70°C using common stress tests. Last time I even allowed the sockets temperature to increase over 70°C, as a result I got a Code 0D (memory) right after soft-off -> soft-on. This happens at "Optimized Defaults" BIOS settings, aka stock everything!

The only time that I ever saw a CPU temperature shutdown was right from BIOS setup to a failed boot when no cooling was applied. Furthermore others and myself had their CPU running a lot hotter while higher voltages than stock values were applied.

I also noticed that Code 8 errors are far more likely to happen at higher temps even when the very same settings and stress tests are used. It's also noteworthy that a Code 8 crash does *not* turn off the mainboard. As a consequence around 1.0 V Vcore are still measurable at the CPU socket even when all means of coolings have failed (pulled pump and fan headers).

- The Asus CH6 CPU temperature sensor mirrored Tctl until BIOS 0902 and then was increased to Tctl+5 (at stock settings) afterwards. Fan/pump header control is usually based on that temperature reading. I assume that this is a deliberate try to keep Ryzen more stable after the many reports of Code 8 (0D/50) errors.

As a consequence of my findings I call BS on AMD's claim that the temperature offset in 1700X and 1800X are only meant to maintain consistent fan profiles compared to 1700.

looncraz · Apr 5, 2017

Timur Born said:
I can repeatedly crash my CH6 into Code 8 (CPU) by increasing temperature towards 70°C using common stress tests.

Loosen your heatsink/waterblock mounting. Solved that problem for me.

Timur Born · Apr 5, 2017

I meant "repeatedly" as in "reproducibly". I deliberately increased temperatures to test both how offsets (plural!) work and how temperature affects stability of even a stock 1800X. I don't get Code 8 crashes when I keep the temps low, using the very same CPU block mounting. I reproduce them crashes using certain loads and voltages, combined with temps around 70°C. It's complex, that's why my last post was so long.

But I will give your suggestion a go anyway. Not today, though, because it's time for bed.

nismotigerwvu · Apr 5, 2017

Timur Born said:
- There are three (3) different *dynamic* offsets to Tctl, +0°C (aka base), +10°C and +20°C.

- Offsets likely get chosen on the basis of CPU instructions usage! They get not seem to be based on power draw/current/voltage, CPU load/core percentage or temperature.

Well that certainly would explain some odd behaviors I've noticed on my 1800X. I'm not certain which background task might be doing it, but even with the most current BIOS for my mobo (Asus Prime X370-Pro) I'll see seemingly random jumps of approximately 10C instantaneously. I had just chalked it up to bugs that needed bashed, but perhaps some task is sending just the right mix of instructions to trip the offset.

w3rd · Apr 6, 2017

JimmiG said:
Same here, it's extremely sensitive to SOC voltage. With stock voltage, it's unstable with the RAM at 3200 CL16. At +0.0125V, it's nearly stable, and at +0.025V it's stable.

Sadly, I can't for the life of me get 3200 at CL14 to work. No amount of tweaking the voltages allows the system to cold boot. Warm boot is fine, but cold boots always lock up the first time. After it has POSTed at 3200 CL14 once, it's stable until I unplug and reconnect the system, then it fails to POST again.
I wonder what's causing these cold boot problems. They seem to be pretty common with higher RAM speeds.

How specifically, are you recovering from the locked cold boot..?

zir_blazer · Apr 6, 2017

I totally forgot to post here about what happened with Ryzen PCIe ACS (Access Control Services) support, for better IOMMU Groups granularity for PCI Passthrough.
According to this, AMD told to a VFIO developer that Ryzen does support ACS and that it should be a Firmware option, supposedly found in a submenu called "NBIO Debug Options". However, that was nearly one month ago, and google results from that menu name pretty much only include that link, or other places where it was directly quoted. Does such menu exist in any reference Firmware, or any other place? What other options are missing?

Also worth mentioning, AsRock has been implementing SR-IOV support in some of their Motherboards. I think I never hear before of it being supported in a consumer Motherboard. So far, it is rather useless on its own as SR-IOV is only used by some NICs and the very latest AMD FirePros for GPU Virtualization, but they have limited Hypervisor support (I think only VMWare products, the amdgpu-pro Linux Driver still can't create Virtual Functions).

I don't know if it was mentioned elsewhere, but chances are it was since is rather major: A reviewer tested Ryzen ECC support. It does work but is "incomplete", but I don't know how "incomplete" it is because is not that I see people doing such type of tests, nor know how a proper Server system with ECC would behave. Still, is has some potential...

JimmiG · Apr 6, 2017

w3rd said:
How specifically, are you recovering from the locked cold boot..?

I just hit the reset switch at the black screen (no signal), and after about 20s, it POSTs with the RAM at 2133 with "Overclock failed! Press F1 to enter setup". I then just save and exit (the setup screen already includes the correct settings). It does a soft reboot and then it works fine until I completely shut down and unplug the system (or turn off the power at the power strip).

This happened with my LPX 3000's at 2933 CL16 and my LPX 3600 at 3200 CL14, but not at 3200 CL16 (which is what I use now since I don't want my computer to be difficult to start like some cranky old car).

Kromaatikse · Apr 6, 2017

Borf said:
Looking at a couple of manuals, I'm going to take it as a no, but if someone could correct me it would be great. None of the USB ports coming off Ryzen or the chipsets support USB alternate mode display port?

Of course not. There's no iGPU, so there's no display signal to send.

w3rd · Apr 6, 2017

JimmiG said:
I just hit the reset switch at the black screen (no signal), and after about 20s, it POSTs with the RAM at 2133 with "Overclock failed! Press F1 to enter setup". I then just save and exit (the setup screen already includes the correct settings). It does a soft reboot and then it works fine until I completely shut down and unplug the system (or turn off the power at the power strip).

This happened with my LPX 3000's at 2933 CL16 and my LPX 3600 at 3200 CL14, but not at 3200 CL16 (which is what I use now since I don't want my computer to be difficult to start like some cranky old car).

Have you though about trying a few consecutive cold reboots in a row..?

Ryzen: Strictly technical

Golden Member

Golden Member

Junior Member

Senior member

Senior member

Senior member

Senior member

Golden Member

Senior member

Senior member

Golden Member

Golden Member

Senior member

Senior member

Platinum Member

Junior Member

Senior member

Senior member

Senior member

Golden Member

Senior member

Golden Member

Platinum Member

Member

Senior member