Ryzen: Strictly technical

dfk7677 · Mar 4, 2017

@The Stilt What is your explanation of Ryzen not being fully utilized in games, even when there is no GPU bottleneck?

CatMerc · Mar 4, 2017

The Stilt said:
Raven isn't necessarily made on 14nm LPP

Is it something you've heard or just saying that for posterity?

Insert_Nickname · Mar 4, 2017

This is a most informative thread, and an impressive piece of work by The Stilt. Thank you...!

Just got back from building and field testing my Ryzen build, so this write-up is much appreciated. I think you did a lot better then most reviews.

That said, the only issue I've had so far is the Crosshair has a serious dislike of the Crucial memory I got. I simply cannot get it above 2400MHz (rated 2666MHz) no matter what timings and voltage, but it chucks along just fine at 2400/15-15-15-36 completely stable. Well done for a brand new platform. Did have a single crash, but that was due to the dumbs, not the system itself.

TheELF · Mar 4, 2017

dfk7677 said:
@The Stilt What is your explanation of Ryzen not being fully utilized in games, even when there is no GPU bottleneck?

Games have to respond to your input,this means that no matter how multithreaded the game is,there is one thread that will always run at 100% of available one-core/single threaded and slowing down the multithreaded part of the game,if you want to see close to full utilization look at pure gpu benchmarks like the division or tomb raider with the fastest card possible and the smallest resolution possible.

dfk7677 · Mar 4, 2017

TheELF said:
Games have to respond to your input,this means that no matter how multithreaded the game is,there is one thread that will always run at 100% of available one-core/single threaded and slowing down the multithreaded part of the game,if you want to see close to full utilization look at pure gpu benchmarks like the division or tomb raider with the fastest card possible and the smallest resolution possible.

That wasn't the case though. No thread went up to 100% in most cases.

Why would the 1700 have less load (on average and in most case in most loaded thread), with no GPU bottleneck but less output (FPS)?

imported_jjj · Mar 4, 2017

Do you have the means to test the PCIe slot and verify that it isn't the cause of the gaming complications?

TheELF · Mar 4, 2017

dfk7677 said:
That wasn't the case though. No thread went up to 100% in most cases.

Why would the 1700 have less load (on average and in most case in most loaded thread), with no GPU bottleneck but less output (FPS)?

Because there are enough cores to juggle the threads around so none reaches 100% usage in task manager/msi because they are measuring over time.
Run process hacker and look at what the actual threads are doing in real time.

PotatoWithEarsOnSide · Mar 4, 2017

dfk7677 said:
That wasn't the case though. No thread went up to 100% in most cases.

Why would the 1700 have less load (on average and in most case in most loaded thread), with no GPU bottleneck but less output (FPS)?

The combination of being:
a) nowhere near full load on CPU,
b) noweher near full load on GPU,
c) using 3x more memory at that particular point, and
d) having circa 10-15 fps lower than the 7700k at the same time...

...is what I found strange.

What woud cause the memory usage to spike like that?
L1>L2>L3>DRAM, right...?
They both have similar L1 cache, R7 1700 has double L2 cache, and R7 1700 has access to more L3 cache, though through 2*CCXs.
The problem has to be with the L3, right?

dfk7677 · Mar 4, 2017

PotatoWithEarsOnSide said:
c) using 3x more memory at that particular point, and

I think the memory refers to VRAM and not system memory.

Rngwn · Mar 4, 2017

PotatoWithEarsOnSide said:
The combination of being:
a) nowhere near full load on CPU,
b) noweher near full load on GPU,
c) using 3x more memory at that particular point, and
d) having circa 10-15 fps lower than the 7700k at the same time...

...is what I found strange.

What woud cause the memory usage to spike like that?
L1>L2>L3>DRAM, right...?
They both have similar L1 cache, R7 1700 has double L2 cache, and R7 1700 has access to more L3 cache, though through 2*CCXs.
The problem has to be with the L3, right?

My $0.02, this is from a layman so take this with a huge pile of salt:

If the wonky coreinfo dump on Ryzen CPU really is how Windows sees the logical cores being mapped to the cache and handles the the way it is (as opposed to being just a "wrong presentation"), then the Ryzen right now may suffer from serious cache thrashing. The L3 are supposed to be seen as 2x8 MB, each shared by the cores of the CCX instead of each logical processors having their own 8 MB or L3 as seen in the dump. This bug is likely makes the windows sees 8x16 = 128 MB of L3. I could imagine the "excess cache" actually goes down in the RAM or trashing like crazy within each of the 8 MB cache.

The Stilt · Mar 4, 2017

majord said:
I have a query or 2!

1. Re SMU data.. I notice HWinfo is reporting Core powers, package power. Is this accurate when in 'OC' mode, with custom Vcore? I ask because it seems to be reading lower than expected at higher vcores.

2. CPU-NB voltage. Does this have a new name? are still listing it as CPU-NB., and does athis plane acctually supply the DF?

The power consumption reported by HWInfo for Zeppelin should be pretty accurate (I haven't validated it fully vs. DCR), however there are few conditions. The data displayed by HWInfo is based on SVI2 telemetry. The data originates from the VRM controller, however the issue is that it passes through the SMU where it technically could be altered / skewed to either direction. This is the exact reason why I never use the SVI2 telemetry for AMD or SVID data for Intel, for power consumption. Additionally, if you use these figures you must not change the Rll (load-line) resistance.

Valantar said:
So you're saying that cTDP limits total power draw over time, but only marginally limits peak turbo speeds. That's very interesting. Might we see mobile SKUs with specs along the lines of Ivy Bridge/Haswell 17W CPUs, with sub-2GHz Base clocks and ~+50% boost clocks (just 2-4x,the cores)? That would be very interesting. I'd gladly see 4c8t chips moving into the 15-25W mobile space, although with an iGPU thrown into the mix you'd probably need another 10+W of thermal headroom.

cTDP caps the total power consumption to a certain value.
The power consumption will never be exceeded, no matter the workload or number of utilized cores. It works exactly like a rev limiter in engines.
The performance impact of limiting the power consumption will naturally depend on the number of utilized cores and the workload. Obviously at e.g. 30W you will be able to run a single core close to its maximum XFR ceiling, while the "n" core stress frequency will be more limited.
On Zeppelin the capped figure is the "Package Power" (PP), unlike with all of the previous designs (excl. Carrizo / Bristol Ridge). This means that all of the different domains (e.g. PCIe Phys, peripherals, etc) are included to this power limit, not just the CPU cores & northbridge like with designs such as Orochi (PD), Kaveri (SR), etc. It is truly a total package power limit.

PPB · Mar 4, 2017

The Stilt said:
The power consumption reported by HWInfo for Zeppelin should be pretty accurate (I haven't validated it fully vs. DCR), however there are few conditions. The data displayed by HWInfo is based on SVI2 telemetry. The data originates from the VRM controller, however the issue is that it passes through the SMU where it technically could be altered / skewed to either direction. This is the exact reason why I never use the SVI2 telemetry for AMD or SVID data for Intel, for power consumption. Additionally, if you use these figures you must not change the Rll (load-line) resistance.

cTDP caps the total power consumption to a certain value.
The power consumption will never be exceeded, no matter the workload or number of utilized cores. It works exactly like a rev limiter in engines.
The performance impact of limiting the power consumption will naturally depend on the number of utilized cores and the workload. Obviously at e.g. 30W you will be able to run a single core close to its maximum XFR ceiling, while the "n" core stress frequency will be more limited.
On Zeppelin the capped figure is the "Package Power" (PP), unlike with all of the previous designs (excl. Carrizo / Bristol Ridge). This means that all of the different domains (e.g. PCIe Phys, peripherals, etc) are included to this power limit, not just the CPU cores & northbridge like with designs such as Orochi (PD), Kaveri (SR), etc. It is truly a total package power limit.

Stilt, regarding base clock overclocking. I usually avoid it myself on Intel platforms because its not worth it as you cant really go over 110 BCLK in most cases, and SATA integrity begins to be a concern for me. But on AM4, does the base clock also alter SATA? Or just PCI-E? Because some people are sugesting to use base clock to overclock say a 1700 from 100 to 120, and make it work in PCI-E 2.0 mode. Won't the SATA CLK generator also be at 120, thus rapidly increasing the chances for data corruption? Base clock seems a good way to avoid the OC mode when overclocking nonetheless.

Second and last, does Zen play by the AMD book regarding tCTL? Are we still with the "these are not real temps, it's just an internal scale made by AMD and you can't just go above X arbritrary number (think it was 72 on Vishera/FX forE.G)"?. I take tCASE became irrelevant this round as the socket infraestructure isn't as compromised as it was with FX chips on AM3+.

The Stilt · Mar 4, 2017

dfk7677 said:
@The Stilt What is your explanation of Ryzen not being fully utilized in games, even when there is no GPU bottleneck?

I'm confident that most of the issues seen are just initial issues, which occur on each and every platform. There are plenty of potential issues, there is no way to deny that. However considering that the whole software- and firmware-stack was basically rewritten in less than four months, my personal opinion is that AMD did extremely well, regardless of all the minor issues. If it was up to me, I would have had postponed the launch by 1-2 months. This would have given both the ODMs and AMD to refine their software and firmwares to a point where most of these issues would have no longer existed.

Regardless, I am certain that all of the minor issues will be ironed out, within the next month or a two. I don't believe there are any actual hardware issues in Zeppelin.
Aside from just fixing the actual issues, I'm confident that the performance will somewhat improve as well

This is my personal point of view on the subject.

DisEnchantment · Mar 4, 2017

I hope so, because I plan to get myself a Ryzen system, since I cannot afford 6900K.

The Stilt · Mar 4, 2017

PPB said:
Stilt, regarding base clock overclocking. I usually avoid it myself on Intel platforms because its not worth it as you cant really go over 110 BCLK in most cases, and SATA integrity begins to be a concern for me. But on AM4, does the base clock also alter SATA? Or just PCI-E? Because some people are sugesting to use base clock to overclock say a 1700 from 100 to 120, and make it work in PCI-E 2.0 mode. Won't the SATA CLK generator also be at 120, thus rapidly increasing the chances for data corruption? Base clock seems a good way to avoid the OC mode when overclocking nonetheless.

Second and last, does Zen play by the AMD book regarding tCTL? Are we still with the "these are not real temps, it's just an internal scale made by AMD and you can't just go above X arbritrary number (think it was 72 on Vishera/FX forE.G)"?. I take tCASE became irrelevant this round as the socket infraestructure isn't as compromised as it was with FX chips on AM3+.

SATA is not affected, at least the ones which are located in Promontory (external FCH).
I'm not certain if Taishan's (internal FCH) SATAs are affected by the BCLK, but there is a chance they are.

tCTL should no longer be on linearized scale on Zeppelin. AFAIK it is the ROS (alternating, highest sensor reading within the CCXs) in °C scale, similar to AMD K10 cores or GPUs.
In my experience the temperatures reported are quite realistic.

I'm not fully certain about the retail parts, however the tCTLMax should be 95°C on all SKUs.
Zeppelin supports cHTC so the ODMs might be configuring the limit to a lower figure if they want, however the default is 95°C.

tCaseMax is the actual maximum package temperature, measured from surface center of the IHS.
This figure is significantly lower (IIRC 71-56°C), as it is an external temperature (a major delta is to be expect).

KTE · Mar 4, 2017

CatMerc said:
Is it something you've heard or just saying that for posterity?

I highly doubt 14nm LPP was what AMD wanted and it was definitely not suitable for HEDT. It looks like it was just what was available for HVM, and is heavily suitable for Mobile.

Sent from HTC 10
(Opinions are own)

DisEnchantment · Mar 4, 2017

@TheStilt, what kind of software fixes would be needed for the OS? scheduler/kernel patches? or rather the USB etc kind of driver support for the integrated chipset?

Also at BIOS level which kind of patches are applied, Timings and Circuitry logic or also micro code?

The Stilt · Mar 4, 2017

DisEnchantment said:
@TheStilt, what kind of software fixes would be needed for the OS? scheduler/kernel patches? or rather the USB etc kind of driver support for the integrated chipset?

Also at BIOS level which kind of patches are applied, Timings and Circuitry logic or also micro code?

I'm not familiar with the potential issues at OS side.

I'd say SMU, PMU (DRAM) firmwares and the microcode are the ones which will have most of the potential improvements & refinements.
Unless there are clear configuration errors made by the ODMs themselves, there isn't too many ways they (the ODMs) can affect performance.

cytg111 · Mar 4, 2017

This is golden, thank you!

thigobr · Mar 4, 2017

Thanks @The Stilt for this amazing work you have done!

Would be possible to test PCIE bandwidth to the graphics card?

cytg111 · Mar 4, 2017

The Stilt said:
850 points in Cinebench 15 at 30W is quite telling. Or not telling, but absolutely massive. Zeppelin can reach absolutely monstrous and unseen levels of efficiency, as long as it operates within its ideal frequency range.

This is the dealmaker as a server chip right?
(also, why is this review in at forums? should have its own space imo)

PPB · Mar 4, 2017

cytg111 said:
This is the dealmaker as a server chip right?
(also, why is this review in at forums? should have its own space imo)

Yes, it is 100x more informative and thorough than the garbage that some sites spew in the form of "reviews"

dnavas · Mar 4, 2017

The Stilt said:
I'm not familiar with the potential issues at OS side.

The large difference you're seeing in draw-call performance between Win7 and Win10 does make me wonder what is going on in Win10. I assume you've tried SMT on/off and HPET on/off.

I'd say SMU, PMU (DRAM) firmwares and the microcode are the ones which will have most of the potential improvements & refinements.

Given the way memory clocking ties into the rest of Ryzen, I'm thinking that the "best" RAM is going to be the fastest, and if G.Skill's Flare is tested on the ASUS board, maybe the 3200 is going to let the system shine just that little bit extra and is worth paying for? Is there info covering differences in performance due to memory speeds? [Also, maybe GSkill's announcement portends future firmware updates coming from ASUS?]

DisEnchantment · Mar 4, 2017

@The Stilt , Looking forward to these improvements in the platform.

So the victim L3 is because the Infinity Fabric cannot be used to snoop the L2 of the other CCX?
Is the Infinity Fabric used for intra - CCX also or only for CCX to outside i.e other CCX / Chipset / GPU etc?
I am also curious if the data fabric is also scalable in Bandwidth, like PCIe lanes for example.

Sorry, I ask too many questions...

I am starting to understand how scalable this architecture is.

PotatoWithEarsOnSide · Mar 4, 2017

dfk7677 said:
I think the memory refers to VRAM and not system memory.

Good spot. Very sloppy of me.
Even so, that is an awful lot of VRAM in comparison to the 7700k under less load.
His earlier videos, at 4k, saw VRAM pretty consistent across both systems...as you'd expect.

Ryzen: Strictly technical

Member

Golden Member

Diamond Member

Diamond Member

Member

Senior member

Diamond Member

Senior member

Member

Member

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Senior member

Golden Member

Golden Member

Lifer

Senior member

Lifer

Golden Member

Senior member

Golden Member

Senior member