Ryzen: Strictly technical

Page 72 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

tamz_msc

Diamond Member
Jan 5, 2017
3,770
3,590
136
The test looks like this when visualized: https://i.imgur.com/38S0XEq.gif

As expected, Intel's compiler provided superior performance across the platforms.
Nearly 40% faster than GCC 7.2 on Ryzen and ~60% faster than MSVC 2017.
That's extremely interesting, basically a simulation with real-time density changes. This is extremely useful and relevant. Thanks again for including this.
 
  • Like
Reactions: Drazick

DisEnchantment

Golden Member
Mar 3, 2017
1,601
5,780
136
Zeppelin Die shot,

XmO6nry.jpg



ClLNviN.png


kMErYXg.pngzeppelin_face_down5annotated.png


Images from here
 

DisEnchantment

Golden Member
Mar 3, 2017
1,601
5,780
136
Nearly 40% faster than GCC 7.2 on Ryzen and ~60% faster than MSVC 2017.
Not surprising, anyone cross compiling gcc for their arch would instantly see what a mess it is, supporting absurd prehistoric architectures, obscure architecures and what not.
However, I did not expect MSVC Compiler to be so slow.
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
Not surprising, anyone cross compiling gcc for their arch would instantly see what a mess it is, supporting absurd prehistoric architectures, obscure architecures and what not.
However, I did not expect MSVC Compiler to be so slow.
Yeah, MSVC is worse even without the "absurd" support GCC offers.
 
  • Like
Reactions: gupsterg

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
My question is, has anyone tried AMD's compiler (AOCC) to see if there are any performance gains to be had. I would be willing to give it a shot if I have time, but I'd first need an open source benchmark that can be run under Windows (as I only run Linux in a VM....blame adobe and my gaming hobby).

EDIT: Oh and Threadripper appears to scale obscenely well with clockspeed. I do wish AMD could Ryzen to 4.7-4.9 GHz, you guys would probably be blown away by the results. I've gotten 8 cores of my Threadripper to 4.45 GHz and the benchmarks were incredible. The results definitely didn't seem linear.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,601
5,780
136
Yeah, MSVC is worse even without the "absurd" support GCC offers.
With Green Hills compiler I was able to get faster code than gcc for the target I am working on. I don't know how it compares with Intel Compiler, never tried it.
My question is, has anyone tried AMD's compiler (AOCC) to see if there are any performance gains to be had. I would be willing to give it a shot if I have time, but I'd first need an open source benchmark that can be run under Windows (as I only run Linux in a VM....blame adobe and my gaming hobby).

EDIT: Oh and Threadripper appears to scale obscenely well with clockspeed. I do wish AMD could Ryzen to 4.7-4.9 GHz, you guys would probably be blown away by the results. I've gotten 8 cores of my Threadripper to 4.45 GHz and the benchmarks were incredible. The results definitely didn't seem linear.
For my workloads, I need cores, I am looking at threadripper.
I use Yocto mainly and the productivity boost is very good which makes me quite excited at what 16C/32T has to offer.
 
Last edited:
  • Like
Reactions: Drazick

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
With Green Hills compiler I was able to get faster code than gcc for the target I am working on. I don't know how it compares with Intel Compiler, never tried it.

For my workloads, I need cores, I am looking at threadripper.
I use Yocto mainly and the productivity boost is very good which makes me quite excited at what 16C/32T has to offer.

Oh I keep mine on 16 cores as well. That's why I bought it, after all. I upgraded from an i7 2600k and wanted something different. it's a workhorse that does everything well. It games, it encodes videos, it renders 3d models, etc.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,346
1,525
136
However, I did not expect MSVC Compiler to be so slow.

MSVC is still good at compiling for 32-bit x86 (iirc they do well in register allocation with heavy register pressure), but they really have fallen behind the curve in optimizing for 64-bit code, producing the kind of output GCC did a decade ago. Not sure if they still consider 32-bit the more important workload or if the compiler team is just really hurting for resources.

What ICC is really, really good at is autovectorization -- it does that much better than anything else on any platform. If what you have is a vectorizable workload, but the programmer hasn't done any vectorization by hand, then ICC will likely beat everything else by a ridiculous margin. But if you are doing heavy numerical computation, you are probably using some existing library that is already vectorized, and ICC won't be that much better than LLVM or GCC.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
Great stuff The Stilt. Looking forward in particular to the CFD benchmark. Also interesting the compiler differences. An eyeopener to my ignorance of compiler performance sensitivity.

How are you intending to release this? Forum posts on here?
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
EEOFvFV.jpg

There's a new setting to tweak called GPP. As I understood it from google translate, it looks like a way to modify the core base clock without affecting the base clock of the DRAM and FCH. It's like having two base clocks.

@The Stilt know anything about it?
 
Last edited:
  • Like
Reactions: Gideon

iBoMbY

Member
Nov 23, 2016
175
103
86
I guess this would only make sense if the Infinity Fabric frequency is also independent from the CPU base frequency? But from that point, would it make any real difference if you clock the CPU with 25x160 instead of 40x100?
 

lamedude

Golden Member
Jan 14, 2011
1,206
10
81
My question is, has anyone tried AMD's compiler (AOCC) to see if there are any performance gains to be had. I would be willing to give it a shot if I have time, but I'd first need an open source benchmark that can be run under Windows (as I only run Linux in a VM....blame adobe and my gaming hobby).
If your expecting ICC but with if(vendorID=="AuthenticAMD")fast code; I've got some bad news.
The much anticipated Zen scheduler for Clang didn't live up to the hype either.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Chapter: Pinnacle Ridge

Consider this as a test report, rather than an actual review.

The Zen-cores inside Pinnacle Ridge have undergone the same changes, which were originally introduced in Raven Ridge APUs earlier this year.

The biggest Zen-core related change compared to Summit Ridge is the L2 cache latency, which has been reduced from 17 CLKs to 12 CLKs. The rest of the changes featured in Pinnacle Ridge are related to manufacturing process, the scalable data fabric (SDF, "IF"), the memory controller and firmware / software configuration of various domains and components.

On average Pinnacle Ridge provides ~1.5% higher IPC than its predecessor. In certain latency sensitive workloads (such as WinRAR) the difference can be as high as 8% in favor of Pinnacle Ridge, confirming said changes to the caches and the slightly improved memory latency. The performance of Pinnacle Ridge was compared against its predecessor and competing Intel designs, using freshly updated and slightly revised test suite. The testing was conducted using the latest versions of the applications (at the time of testing, Feb-Apr 2018). All public and self-compiled workload binaries using Intel compiler have been patched in order to disable CPU vendor dependent dispatching.

Test setups (IPC)

  • R7 1700 3.8GHz
  • ASUS Crosshair VI Hero (bios 6001, PinnaclePI 1.0.0.0a)
  • G.Skill FlareX 3200C14 2x8GB, 2666MHz CL14-14-14-32
  • Windows 10 Enterprise x64 16299.248 / 16299.334
  • R7 2700X 3.8GHz
  • ASUS Crosshair VII Hero (bios 0097 – 0505, PinnaclePI 1.0.0.0a – 1.0.0.2)
  • G.Skill FlareX 3200C14 2x8GB, 2666MHz CL14-14-14-32
  • Windows 10 Enterprise x64 16299.248 / 16299.334
  • Core i7-8700K 3.8GHz (Ring / LLC 3.5GHz)
  • ASUS PRIME Z370-A (bios 0607 / 0613 w/ final Spectre µCode 0x84 patched in)
  • G.Skill FlareX 3200C14 2x8GB, 2666MHz CL14-14-14-32
  • Windows 10 Enterprise x64 16299.251 / 16299.334
  • Core i9-7960X 3.8GHz (Mesh / LLC 2.4GHz)
  • ASUS Rampage VI Apex (bios 1102 w/ final Spectre µCode 0x2000043 patched in)
  • Corsair LPX 3600C16 4x8GB, 2666MHz CL14-14-14-32
  • Windows 10 Enterprise x64 16299.251 / 16299.334
Information about the charts

“ER” stands for “extremities removed”. In this category the single smallest and the largest difference have been excluded from the total value. The extremities are: 3DPM (Low) & WinRAR (High) for Pinnacle Ridge, GCC (Low) & NBody (High) for Coffee Lake, GCC (Low) & Linpack (High) for Skylake-X.

“Excl. >=256b” means that workloads, which gain a significant advantage from wider than 128-bit instructions have been excluded from the total value. The excluded workloads are: Embree, Variable Density Fluid Simulation, Linpack, NBody & X265 for Coffee Lake and Skylake-X.

Some of you might recall that not more than couple weeks ago, I quoted drastically different IPC differences for the different architectures (excluding Pinnacle of course). The differences between the quoted figures and the ones now displayed here were caused by a single benchmark: libBullet. For whatever reason, until very recently libBullet has had abysmal performance on all Ryzen CPUs. Recently, something has changed and the performance is now comparable to Intel CPUs (or around 3x better than it originally was). Since the binaries, the test method or the system settings have not changed, the difference has to boil down to either AGESA or OS updates which addressed some sort of a problem.

IPC results

YuvqEF9.png


Check the two following posts for the individual results.

SKU vs. SKU results

When it comes down to SKU versus SKU comparisons, I am not providing individual results but only the summary. All of the tested CPUs operated at their stock configuration and special attention was given to this matter.

  • R7 1800X
  • ASUS Crosshair VI Hero (bios 6001, PinnaclePI 1.0.0.0a)
  • G.Skill FlareX 3200C14 2x8GB, 2666MHz CL14-14-14-32
  • Windows 10 Enterprise x64 16299.248 / 16299.334
  • R7 2700X
  • ASUS Crosshair VII Hero (bios 0505, PinnaclePI 1.0.0.2)
  • G.Skill FlareX 3200C14 2x8GB, 2933MHz CL14-14-14-32
  • Windows 10 Enterprise x64 16299.248 / 16299.334
  • PR Notes: "ASUS Performance Enhancement" == "Default", "Precision Boost Override" & "Precision Boost Override Scalar" == "Auto" (Enabled).
  • Core i7-8700K
  • ASUS PRIME Z370-A (0613 w/ µCode 0x84)
  • G.Skill FlareX 3200C14 2x8GB, 2666MHz CL14-14-14-32
  • Windows 10 Enterprise x64 16299.251 / 16299.334
  • CFL Notes: Power limits were manually enforced and verified (95W PL1, 119W PL2, 1 second Tau). 1C = 4.7GHz, 2C = 4.6GHz, 3C = 4.5GHz, 4-5C = 4.4GHz, 6C = 4.3GHz.
  • Core i7-7820X
  • ASUS Rampage VI Apex (bios 1102 w/ final Spectre µCode 0x2000043 patched in)
  • Corsair LPX 3600C16 4x8GB, 2666MHz CL14-14-14-32
  • Windows 10 Enterprise x64 16299.251 / 16299.334
  • SKL-X Notes: Ring/LLC was limited to 2.4GHz, power limits were manually enforced and verified (140W PL1, 175W PL2, 1 second Tau). Windows Update automatically installs TBM3 application, and therefore it was utilized during the tests. AVX2 / AVX512 offsets were manually enforced and verified. Single threaded workloads: non-AVX2/AVX512 = 4.5GHz (TB3), AVX2 = 4.0GHz, AVX512 = 3.8GHz. nT workloads: non-AVX2/AVX512 = 4.0GHz, AVX2 = 3.6GHz, AVX512 = 3.3GHz.
7ffHivf.png


The changes


Pinnacle Ridge is manufactured using the new GlobalFoundries 12nm LP ("Leading Performance") manufacturing process, which is an enhanced version of the 14nm LPP (Low Power Plus) process used on Summit Ridge. According to GlobalFoundries, the 12nm LP process provides > 10% higher performance at ISO(metric) power and up to 15% higher density than the original 14nm LPP process.

One of the biggest performance affecting changes in Pinnacle Ridge CPUs is related to the performance boost algorithm called as XFR 2. The original XFR implementation introduced in Summit Ridge received criticism, as it was rather restricted in terms of functionality and really didn't live up the marketing hype. Unlike with the original XFR implementation where the available boost level was determined based on the number of cores used, with XFR 2 the CPU is allowed to boost until any of the limiters (thermal, power, current or reliability) is reached.

In short; the XFR on Summit Ridge operated on core utilization basis (similar to Intel Turbo), whereas the XFR 2 in Pinnacle Ridge operates solely on physical limitation basis.

The "Precision Boost Override" feature available on 400-series motherboards allows increasing the physical limiters mentioned earlier. On SKUs belonging to the 105W TDP infrastructure group, the default limiters are following: PPT 141.75W, TDC 95A, EDC 140A and tJMax of 85°C (absolute, excl. offset).

When "Precision Boost Override" mode is enabled (AGESA default), PPT becomes essentially unrestricted (1000W), TDC is set to 114A and EDC to 168A. These limits can be customized by the ODM so that the new limits will comply with the electrical characteristics of the motherboard design in question.

Essentially this means that the entry-level or the tiny ITX boards with more limited VRM should use much more conservative limits than the high-end enthusiast-oriented motherboards. If (or rather how exactly) AMD will enforce these good configuration practices remains to be seen thou.

It is a common misconception (admittingly due to poor marketing) that XFR performs automatic overclocking of some sort, neither XFR or XFR 2 does that. XFR / XFR 2 only attempts to maximize the performance within the SKU specific limits. For example, at stock the 2700X SKU will never boost higher than 4.35GHz, which is it's advertised maximum frequency.

Since all of the cores within the CPUs are not created equal, all of them are not able to reach the advertised maximum frequency (usually due to the required voltage being too high).

AMD has included a similar "core binning" feature in Pinnacle Ridge as Intel introduced in Skylake-X (Turbo Boost Max 3.0). The two cores which have the best characteristics are allowed to boost to the advertised maximum, while the rest are limited to slightly lower speeds. The best core within a CPU is marked with a golden star and the second best one with a silver star in Ryzen Master. Most often the best performing core appears to core 0 and the worst performing core, core 7. Meanwhile the order of the other cores varied greatly and none of the tested CPUs had completely identical binning ordering.

At the time of the review, there was neither an actual software nor information if or rather when the affinity management software solution (similar to Intel TBM3) for Pinnacle Ridge CPUs is expected to become available. Due to the fact that there is a slight performance difference (i.e. maximum frequency) between the different cores of the CPU, the single threaded performance varies slightly and essentially depends on luck (the amount of time the workload gets scheduled on the highest boosting cores).

Typically, the performance difference between the best and the worst core within a CPU is around 1.8%, while the absolute worst-case scenario (expecting 100% residency either on the worst or the best core) is around 3.6% (4.2GHz – 4.35GHz). Aside of the pure “luck” (scheduling) factor, the tests on the various different samples hint that there is also some degree of performance variance, which originates from the silicon itself. So, don’t be alarmed if you notice a slight variance in the results from the different reviewers or users.

The memory controller

The memory controller in Pinnacle Ridge is identical to the one found in Raven. There are some differences in the software configuration, however the Phy IP itself is no doubt identical as the two share the same controller firmwares.

Compared to Summit Ridge, the revised controller in Pinnacle Ridge provides < 8.7% lower access latency on average (2133-3466MHz). The latency difference is largest at =< 2666MHz frequencies and starts to tail off at higher speeds.

Likewise, the SDF latency has slightly benefited from the changes. The average latency improvement (CCX2CCX latency) is < 2.2%, but just like with the memory latency the difference is tailing off as the memory speed increases. At 3200MHz MEMCLK the SDF latency difference almost falls within the margin of error.

GF3tru1.png


Despite the extremely welcome latency improvement in Pinnacle Ridge, the memory latency is unfortunately still < 38% higher on average (2133-3466MHz) than on its closest rival from Intel (Coffee Lake).

While the changes to the memory controller in Pinnacle Ridge do provide lower latency, unfortunately the highest achievable memory frequency seems to be exactly the same as on Summit and Raven Ridge parts. A realistic expectation would be 3400 - 3533MHz depending on the silicon quality, the motherboard and the DRAM modules used. Some CPU specimens featuring an exceptional memory controller might be able to reach 3600MHz while maintaining true stability, however all of the tested 2700X samples were limited to 3400 - 3533MHz on both Crosshair VII Hero and MSI B350I PRO AC motherboards, regardless of the settings or the memory modules used. The stability was determined using “Ram Test” utility, which obviously sets the bar for stability a lot higher than the tests methods other reviewers typically use to deem the memory as “stable” at certain frequency.

On the tested samples, the distribution of the maximum achievable memory frequency was following:

  • 3400MHz – 12.5% of the samples
  • 3466MHz – 25.0% of the samples
  • 3533MHz – 62.5% of the samples

There are clear differences in how the memory controller behaves on the different CPU specimens. The majority of the CPUs will do 3466MHz or higher at 1.050V SoC voltage, however the difference lies in how the different specimens react to the voltage. Some of the specimens seem scale with the increased SoC voltage, while the others simply refuse to scale at all or in some cases even illustrate negative scaling. All of the tested samples illustrated negative scaling (i.e. more errors or failures to train) when higher than 1.150V SoC was used. In all cases the maximum memory frequency was achieved at =< 1.100V SoC voltage.

AMD has revised the memory layout design guidance with Pinnacle Ridge targeting motherboards (i.e. 400-series) in an effort to potentially make the higher memory frequencies possible.

While this might theoretically improve the frequencies on some motherboards, generally the frequency limiting factor is the memory controller itself and not the topology the motherboard uses for memory signaling. Because of that, the newer 400-series motherboards alone should not be expected to provide improved memory frequencies at least by a significant margin.

The autopsy

Similar to Summit and Raven Ridge, the overclocking headroom on high-end Pinnacle Ridge SKUs is extremely slim. Personally, I would go as far as advising against directly overclocking the 2700X SKU at all, as the overclocking in most cases will result in reduction of single thread performance.

YRWrsuc.png


As indicated by the Fmax-Vmin curve, the general behavior is almost identical to Summit and Raven Ridge. The scaling and power efficiency is excellent until 3.5GHz and reasonable scaling takes place until 4.0GHz. Beyond this point the voltage scaling is anything but linear, and the CPU is requiring higher voltage in increasingly larger steps to scale further. For example, 4.1GHz requires > 112mV higher voltage than 4.0GHz and the difference in voltage results in 41.5W (or 34.3%) increase in power consumption.

At these power levels the cooling also becomes an issue and even the higher-end heatsinks and AIOs will be operating close to their saturation point at full load. In an open-air test bench (22°C ambient), using a relatively high-end Thermalright Ultra 120 Extreme (w/ 68CFM fan) after-market heatsink the temperatures reached 93°C (including 10°C offset) when the CPU was fully stressed at 4.1GHz / 1.337V.

Note that the power curve below only includes the power of the main voltage plane, used by the CPU cores themselves (VDDCR_CPU). The second main plane, which is used by the CPU SoC (VDDCR_SOC) elements adds 15 - 30W on top of that, depending on the SoC voltage.

CGSdAWF.png


The high temperatures and the fact that the nominal voltage of Pinnacle Ridge has reduced compared to Summit Ridge also indicates that the 12nm LP process features significantly higher amounts leakage than the older process variant, as the difference in power consumption is not enough to make up the difference in voltage.

4.2GHz and even slightly higher frequencies will be possible, however achieving those frequencies outside brief benchmarking runs will most likely require a non-AIO watercooling setup, a top tier motherboard and potentially increasing the voltage to levels which is unsafe to the silicon.

Because of that I would personally advice against overclocking these CPUs and exploring the possibilities XFR 2 has made available instead. Some ASUS motherboards will also feature additional (and exclusive) XFR 2 related tweaks, which essentially allow you to maintain 4.1GHz+ on all cores and 4.3GHz+ on two cores without entering into the actual overclocking mode (OC-Mode).

The loss in all core performance will usually be smaller than the increase in single threaded performance is, as most of the CPUs won't be reaching much over 4.1GHz anyway. Personally, I expect 4.1 - 4.15GHz to be the new average maximum frequency for 8-core SKUs (compared to the average OC of 3.85GHz on Summit Ridge), given that high-end cooling and a higher-end motherboard are used.

Where is the limit?

The maximum safe voltages for CPUs are an eternal riddle, as neither of the two manufacturers release this information for public consumption. Public or even the NDA documents generally specify a vague limit, which most of the time relates to a point where the catastrophic failures become more common instead of specifying the voltage that is safe to sustain without causing any damage to the silicon. Such limit is admittingly rather hard to specify, as the limit will vary between the different CPU specimens (silicon variance, SIDD) and operating scenarios (peak current in different utilization scenarios, temperature, etc.).

In order to get the most accurate answer for this question I ended up “asking” the CPU itself. As stated previously, the CPU features various different limiters / safe guards (Package Power Tracking: PPT, Thermal Design Current: TDC, Electrical Design Current: EDC, thermal protection and FIT).

“FIT” as the name suggest is a feature to monitor / track the fitness of the silicon and adjust the operating parameters to maintain the specified and expected reliability. Many semiconductor manufacturers utilize such feature to eke out every last bit of performance, in an ERA where most of the semiconductors are process bound in terms of performance. In short: FIT feature allows the manufacturers to push their designs to the very limit out of the box, without jeopardizing the reliability of the silicon. A practical example would be the knock sensors on an engine. The control unit of the engine always tries to advance the ignition timing as much as possible, to produce the best possible power / torque figures. The purpose of the knock sensors is to listen if knocking occurs and tell the ECU to reduce the timing advance when it does, in order to protect the engine.

To see what the actual maximum voltage FIT allows the CPU to run at in various different scenarios is, I disabled all of the other limiters and safe guards. With every other limiter / safe guard disabled, the reliability (FIT) becomes the only restrain. The voltage command which the CPU sends to the VRM regulator via the SVI2 interface and the actual effective voltage were then recorded in various scenarios. In stock configuration the sustained maximum effective voltage during all-core stress allowed by FIT was =< 1.330V. Meanwhile, in single core workloads the sustained maximum was =< 1.425V. When the “FIT” parameters were adjusted by increasing the scalar value from the default 1x to the maximum allowed value of 10x, the maximum all-core voltage became 1.380V, while the maximum single core voltage increased to 1.480V. The recorded figures appear to fall very well in line with the seen and known behavior, frequency, power and thermal scaling wise.

The seen behavior suggests that the full silicon reliability can be maintained up to around 1.330V in all-core workloads (i.e. high current) and up to 1.425V in single core workloads (i.e. low current). Use of higher voltages is definitely possible (as FIT will allow up to 1.380V / 1.480V when scalar is increased by 10x), but it more than likely results in reduced silicon lifetime / reliability. By how much? Only the good folks at AMD who have access to the simulation data will know for sure.

These figures will almost certainly vary between the different CPU specimens (due to SIDD and other silicon specific factors), however the recorded values were almost identical on all of the tested samples (within 20mV, lowest-highest leaking specimen).

Also note that the figures stated here relate to the actual effective voltage, and not to the voltage requested by the CPU. The CPU is aware of the actual effective voltage, so things like load-line adjustments and voltage offsets will modify the CPUs voltage request from the VRM controller accordingly. The most accurate method to measure the effective voltage on AM4 platform is to monitor the “VDDCR_CPU SVI2 TFN” voltage, which is available in HWInfo. This reading is sourced directly from the VRM controller (through SVI2 interface) and generally it is the most accurate reading available to end-users by far. As a side note, while the TFN (“telemetry function”) voltage readings are always generic (and accurate), never blindly trust the reported current and power readings (as every motherboard model needs separate calibration).

The process improvements

As advertised by GlobalFoundries, Pinnacle Ridge made on the 12nm LP node-let does indeed provide slightly higher performance (i.e. frequency) at the same power, compared to its predecessor made on the older 14nm LPP process. But not by > 10%, which was stated in the marketing materials. In 2.0 - 4.0GHz frequency range the difference in performance at ISO(metric) power peaks at 4.4% (~3.75GHz), while the average improvement through the entire range is 3.65%.

jbaXO5r.png


The frequency relations

Pinnacle Ridge CPUs also support multiple reference clock inputs. Motherboards which support the feature will allow "Synchronous" (default) and "Asynchronous" operation. In synchronous-mode the CPU has a single reference clock input, just like Summit Ridge did. In this configuration increasing the BCLK frequency will increase CPU, MEMCLK and PCI-E frequencies.

In asynchronous-mode the CPU cores will have their own reference clock input. MEMCLK, FCLK and PCI-E input will always remain at 100.0MHz, while the CPU input becomes separately adjustable. This allows even finer grain CPU frequency control, than the already extremely low granularity "Fine Grain PStates" (with 25MHz intervals) do.

Despite some wild speculation, the asynchronous clocking capability makes no difference to the memory & data fabric (“IF”) frequency relations. These “two” frequencies are permanently tied together in every currently existing Zen design and changing the current topology would require a major overhaul to the foundations of the die.

The power consumption

When comparing the new flagship 2700X SKU against its predecessor the 1800X, the 2700X provides on average 6.1% higher single threaded performance and 10.2% higher multithreaded performance when using a 400-series motherboard. The improvement however doesn’t come without a cost; Despite the advertised power rating of 2700X has only increased by 10W (or by 10.5%), the actual power consumption has increased by significantly more: over 24%. At stock, the CPU is allowed to consume >= 141.75W of power and more importantly, that is a sustainable limit and not a short-term burst like of limit as on Intel CPUs (PL1 vs. PL2).

The chart below illustrates what this means in practice.

DwPWcLa.png


Personally, I think that AMD should have rated these CPUs for 140W TDP instead of the 105W rating they ended up with. The R7 2700X is the first AMD CPU I’ve ever witnessed to consume more power than its advertised power rating. And honestly, I don’t like the fact one bit. Similar practices are being exercised on Ryzen Mobile line-up, however with one major difference: The higher than advertised (e.g. 25W boost on 15W rated SKUs) power limit is not sustainable, but instead a short-term limit like on Intel CPUs. The way I see it, either these CPUs should have been rated for 140W from the get-go, or alternatively the 141.75W power limit should have been a short-term one and the advertised 105W figure a sustained one.

According to AMD, the TDP is determined as follows: tCaseMax – tAmbientMax / ϴca (the minimum thermal resistance of the cooling element, °C/W). For the 105W rated SKUs these limits are following: tCaseMax = 61.8°C, tAmbient = 42.0°C and ϴca 0.189°C/W. Regardless how the advertised power rating has been established, it doesn’t change the fact that the actual power consumption on these parts is higher than advertised and more importantly how the consumers generally perceive and compare the advertised power ratings of the different CPUs.

It is not uncommon for a modern CPU to temporarily exceed its rated TDP, as most of the infrastructure definitions from both AMD and Intel include such functionality by the design. For example, recent Intel CPUs, such as Skylake and newer have the boost (PL2) power limit set 25% higher than their rated TDP (PL1). However, the raised boost limit is only available for a thermally insignificant amount of time (1 second on Intel by the specification).

Since 105W TDP rated Pinnacle Ridge CPUs are allowed to sustain >= 141.75W of power draw, and more importantly because at stock they do consume significantly more than the rated 105W even in real world multithreaded workloads, their advertised power rating in my opinion is not entirely fair and might end up misleading the consumers. The measured sustained power consumption for a stock 2700X was 127.64W (132W peak) during X264 encoding and 142.52W (146.5W peak) during Prime95 28.10. In comparison, a stock i9-7960X CPU with its power limit reduced from the default 165 / 206W to 140 / 175W (PL1, PL2) sustained 139.82W power draw and had a peak draw of 168W in the very same X264 workload. All of the stated power figures are based on DCR (current over inductor) measurements and therefore external conversion (VRM, PSU) losses are not included in them.

Despite the rant about the power consumption, at least to me the issue is more of an ideological one than a practical one. The 2700X SKU ships with the biggest factory heatsink (with 0.170 ϴca °C/W according to AMD) solution the industry has ever seen, so cooling wise it is pretty much irrelevant if the power dissipation of the CPU has been slightly understated or not. Regardless, in my opinion both AMD and Intel should clearly state both the sustainable and the boost / peak power figures for their products, preferably right at the product page at their website. Currently the information is either completely unavailable to the consumers (i.e. NDA required) or alternatively rather hard to come by.

Due the 2700X being the current flagship SKU it wouldn’t surprise me one bit if the little brother R7 2700 SKU turned out to be more power efficient and cooler running part, when both of them are overclocked to same frequency. Even at stock, the 2700X practically operates at the very limit of what is technically possible to achieve on the 12nm LP node-let.

Because of that, the 2700X SKUs most likely uses silicon which features higher SIDD (static leakage) than the average silicon used in other lower clocked SKUs. Silicon which features higher SIDD will most of the time require less voltage to reach certain frequency, compared to silicon with lower leakage characteristics. However, despite the lower voltage on the higher SIDD silicon, the power consumption is usually slightly higher due to the higher current draw and temperatures (and therefore lower overall efficiency). Using higher SIDD silicon is particularly useful when operating in a region, where the voltage cannot be raised any further e.g. due to a design limit (or dependencies).

Similar binning practices for the flagship SKUs have been exercised with the previous Ryzen “X” series SKUs, as well as on the previous CPU line-ups (FX-9000 series). Unfortunately, I didn’t have the chance to verify the theory in practice due the lack of a R7 2700 SKU, however I’m quite certain that it is the case with the 2700X as well.

Pinnacle Ridge is no doubt a worthy replacement for the current line-up of Summit Ridge CPUs, released a year ago. However, considering the limitations which will be present on existing 300-series AM4 motherboards and the fact that the overall improvements in performance and overclocking capabilities are rather modest over the previous generation products, I would expect that AMD will face the usual difficulties in luring the existing Ryzen users into upgrading to Pinnacle Ridge.
 
Last edited:

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Core vs. Core at 3.8GHz (IPC) - Continued

SRns1SF.png


U4MdPXL.png


gZvOjoZ.png


BJ4lTct.png


kb0ls5D.png


BqdOvkh.png


1zyruJ2.png


04Wx5MO.png


BU8y5Ch.png


SDKHoSO.png


vif4Ff3.png


NsQgPry.png


ZRA2JTX.png


bxFQUlF.png


0JqZ6Uf.png


Regarding X265: Last week, X265 received its first set of AVX512 code. The newer code path improves the performance on Skylake-X by 5.1 - 6.5%. The encoder used in this test did not include the new code paths.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
What the heck is going on with GCC? Spectre/Meltdown? Also, is Eigen memory bound? I would expect a dense solver to give more performance on SKL-X.

The only way I can explain the GCC results on Intel is the GPZ fixes.
The performance didn't change no matter what I tried and even disabling Spectre / Meltdown fixes by using the tool made no difference.
Either the tool (InSpectre) cannot really disable those fixes, or the performance penalty is caused by some other related change in the microcodes.

The penalty is indeed massive as previously Kaby Lake was > 21% faster in GCC at the same clock, compared to Summit Ridge.

Eigen indeed appears to be totally memory latency bound, as there is basically no scaling from the CPU frequency either.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,770
3,590
136
The only way I can explain the GCC results on Intel is the GPZ fixes.
The performance didn't change no matter what I tried and even disabling Spectre / Meltdown fixes by using the tool made no difference.
Either the tool (InSpectre) cannot really disable those fixes, or the performance penalty is caused by some other related change in the microcodes.

The penalty is indeed massive as previously Kaby Lake was > 21% faster in GCC at the same clock, compared to Summit Ridge.

Eigen indeed appears to be totally memory latency bound, as there is basically no scaling from the CPU frequency either.
Thanks for the reply. This issue with Intel and the GPZ fixes definitely warrants more investigation. It seems Ian also got reduced performance in the review with the chromium compile test.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
Nicely done @The Stilt !

It appears that GloFo's claims for 12nm were a bit exagarated. Either that or AMD didn't fully take advantage of the node and made minimal effort. According to Anandtech they kept the same die size and use 9T cells just like before, so I'm guessing AMD just didn't put much effort into this one.
 

epsilon84

Golden Member
Aug 29, 2010
1,142
927
136
As usual a fantastic review by The Stilt, thanks for your hard work its much appreciated!

I'm a bit disappointed by the lack of significant improvement in frequency, a ~4% uplift at the same voltages isn't much after hearing AMD claims of a 10% improvement.
 

epsilon84

Golden Member
Aug 29, 2010
1,142
927
136
They claimed 10% performance improvement. That means clockspeed + IPC.

Which they pretty much got, to be fair, and that is due to a combination of clockspeed, IPC and also faster memory support.

I guess I'm just a bit disappointed with the 4.2GHz wall most reviewers seem to be hitting, with some only getting 4.1GHz. I had (clearly misguided, in hindsight) hopes of ~4.5GHz, but even 4.3GHz would have been good, but it appears that only 'golden samples' can hit 4.3GHz, and 4.5GHz is simply not possible without exotic cooling or unsafe voltages.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
I'm a bit disappointed by the lack of significant improvement in frequency, a ~4% uplift at the same voltages isn't much after hearing AMD claims of a 10% improvement.

It's Global Foundries that claimed 10% improvement.

These CPUs(Ryzen/Coffeelake/SKLX) are all running at their limits. Process changes won't result in much gains, if at all. 10% claim may be realistic for lower frequency parts such as server or mobile.

The CPU is a solid improvement. Nice job by AMD.

And thanks @ Stilt for the benchmarks. I think you are doing a better job than most review sites.
 
Status
Not open for further replies.