Ryzen: Strictly technical

arandomguy · Mar 2, 2017

I'm disappointed in what XFR turned out to be versus what I initially thought it could be if I'm understanding it correctly. It still basically behaves like existing CPU turbos in that there is essentially only 2 steps, ST and All core with no intermediates correct? As opposed to GPU turbos which actually have much more fine grain and dynamic clock speeds actually based on workload?

bjt2 · Mar 2, 2017

The dLDO is present also in bristol ridge, according to an isscc paper. Moreover hiroshito goto posted, weeks ago some slides from an AMD presentation, were it was stated that dLDO was active. It can be easily proved if the dLDO is active in Ryzen: measure low load or idle consumption with and without OC mode enabled.

tamz_msc · Mar 2, 2017

First of all big thanks for the detailed results.

Question: Do you think the performance in NBody, Linpack etc. that use FMA can be hindered due to cache? You mentioned that the L3 frequency cannot be set independently within the CCX. Maybe because changing frequencies as data is moved around can affect coherency? NBody is a coupled system of differential equations, and Gauss-Jordan elimination used in Linpack does move a lot of data around.

Follow up question: How to measure actual CPU core voltage instead of VID with software?

PPB · Mar 2, 2017

Stilt, to make my motherboard choice easier, if I'm targetting a 3.5ghz allcore clock on a 1700 (the second critical point where beyond that voltage scales exponentially to clock ratio), what maximum sustained Amperage should i look after in vrm designs? Also, have you any hindsight in the vrm designs on b350 motherboards, considering the most powerful vrm designs tend to go in the highest end x370 boards? Thanks in advance.

Sent from my XT1040 using Tapatalk

iBoMbY · Mar 2, 2017

The Stilt said:

Code:

AMD Ryzen: ZD3601BAM88F4_40/36_Y            
AMD64 Family 23 Model 1 Stepping 1, AuthenticAMD
HTT           *   Multicore
HYPERVISOR   -   Hypervisor is present
VMX           -   Supports Intel hardware-assisted virtualization
SVM           *   Supports AMD hardware-assisted virtualization
X64           *   Supports 64-bit mode

SMX           -   Supports Intel trusted execution
SKINIT       *   Supports AMD SKINIT

NX           *   Supports no-execute page protection
SMEP         *   Supports Supervisor Mode Execution Prevention
SMAP         *   Supports Supervisor Mode Access Prevention
PAGE1GB       *   Supports 1 GB large pages
PAE           *   Supports > 32-bit physical addresses
PAT           *   Supports Page Attribute Table
PSE           *   Supports 4 MB pages
PSE36         *   Supports > 32-bit address 4 MB pages
PGE           *   Supports global bit in page tables
SS           -   Supports bus snooping for cache operations
VME           *   Supports Virtual-8086 mode
RDWRFSGSBASE   *   Supports direct GS/FS base access

FPU           *   Implements i387 floating point instructions
MMX           *   Supports MMX instruction set
MMXEXT       *   Implements AMD MMX extensions
3DNOW         -   Supports 3DNow! instructions
3DNOWEXT      -   Supports 3DNow! extension instructions
SSE           *   Supports Streaming SIMD Extensions
SSE2         *   Supports Streaming SIMD Extensions 2
SSE3         *   Supports Streaming SIMD Extensions 3
SSSE3         *   Supports Supplemental SIMD Extensions 3
SSE4a         *   Supports Streaming SIMDR Extensions 4a
SSE4.1       *   Supports Streaming SIMD Extensions 4.1
SSE4.2       *   Supports Streaming SIMD Extensions 4.2

AES           *   Supports AES extensions
AVX           *   Supports AVX intruction extensions
FMA           *   Supports FMA extensions using YMM state
MSR           *   Implements RDMSR/WRMSR instructions
MTRR         *   Supports Memory Type Range Registers
XSAVE         *   Supports XSAVE/XRSTOR instructions
OSXSAVE       *   Supports XSETBV/XGETBV instructions
RDRAND       *   Supports RDRAND instruction
RDSEED       *   Supports RDSEED instruction

CMOV         *   Supports CMOVcc instruction
CLFSH         *   Supports CLFLUSH instruction
CX8           *   Supports compare and exchange 8-byte instructions
CX16         *   Supports CMPXCHG16B instruction
BMI1         *   Supports bit manipulation extensions 1
BMI2         *   Supports bit manipulation extensions 2
ADX           *   Supports ADCX/ADOX instructions
DCA           -   Supports prefetch from memory-mapped device
F16C         *   Supports half-precision instruction
FXSR         *   Supports FXSAVE/FXSTOR instructions
FFXSR         *   Supports optimized FXSAVE/FSRSTOR instruction
MONITOR       *   Supports MONITOR and MWAIT instructions
MOVBE         *   Supports MOVBE instruction
ERMSB         -   Supports Enhanced REP MOVSB/STOSB
PCLMULDQ      *   Supports PCLMULDQ instruction
POPCNT       *   Supports POPCNT instruction
LZCNT         *   Supports LZCNT instruction
SEP           *   Supports fast system call instructions
LAHF-SAHF    *   Supports LAHF/SAHF instructions in 64-bit mode
HLE           -   Supports Hardware Lock Elision instructions
RTM           -   Supports Restricted Transactional Memory instructions

DE           *   Supports I/O breakpoints including CR4.DE
DTES64       -   Can write history of 64-bit branch addresses
DS           -   Implements memory-resident debug buffer
DS-CPL       -   Supports Debug Store feature with CPL
PCID         -   Supports PCIDs and settable CR4.PCIDE
INVPCID       -   Supports INVPCID instruction
PDCM         -   Supports Performance Capabilities MSR
RDTSCP       *   Supports RDTSCP instruction
TSC           *   Supports RDTSC instruction
TSC-DEADLINE   -   Local APIC supports one-shot deadline timer
TSC-INVARIANT   *   TSC runs at constant rate
xTPR         -   Supports disabling task priority messages

EIST         -   Supports Enhanced Intel Speedstep
ACPI         -   Implements MSR for power management
TM           -   Implements thermal monitor circuitry
TM2           -   Implements Thermal Monitor 2 control
APIC         *   Implements software-accessible local APIC
x2APIC       -   Supports x2APIC

CNXT-ID       -   L1 data cache mode adaptive or BIOS

MCE           *   Supports Machine Check, INT18 and CR4.MCE
MCA           *   Implements Machine Check Architecture
PBE           -   Supports use of FERR#/PBE# pin

PSN           -   Implements 96-bit processor serial number

PREFETCHW    *   Supports PREFETCHW instruction

Maximum implemented CPUID leaves: 0000000D (Basic), 8000001F (Extended).

Logical to Physical Processor Map:
**--------------  Physical Processor 0 (Hyperthreaded)
--**------------  Physical Processor 1 (Hyperthreaded)
----**----------  Physical Processor 2 (Hyperthreaded)
------**--------  Physical Processor 3 (Hyperthreaded)
--------**------  Physical Processor 4 (Hyperthreaded)
----------**----  Physical Processor 5 (Hyperthreaded)
------------**--  Physical Processor 6 (Hyperthreaded)
--------------**  Physical Processor 7 (Hyperthreaded)

Logical Processor to Socket Map:
****************  Socket 0

Logical Processor to NUMA Node Map:
****************  NUMA Node 0

No NUMA nodes.

Logical Processor to Cache Map:
*---------------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
*---------------  Instruction Cache   0, Level 1,   64 KB, Assoc   4, LineSize  64
*---------------  Unified Cache       0, Level 2,  512 KB, Assoc   8, LineSize  64
*---------------  Unified Cache       1, Level 3,   16 MB, Assoc  16, LineSize  64
-*--------------  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
-*--------------  Instruction Cache   1, Level 1,   64 KB, Assoc   4, LineSize  64
-*--------------  Unified Cache       2, Level 2,  512 KB, Assoc   8, LineSize  64
-*--------------  Unified Cache       3, Level 3,   16 MB, Assoc  16, LineSize  64
--*-------------  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
--*-------------  Instruction Cache   2, Level 1,   64 KB, Assoc   4, LineSize  64
--*-------------  Unified Cache       4, Level 2,  512 KB, Assoc   8, LineSize  64
--*-------------  Unified Cache       5, Level 3,   16 MB, Assoc  16, LineSize  64
---*------------  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
---*------------  Instruction Cache   3, Level 1,   64 KB, Assoc   4, LineSize  64
---*------------  Unified Cache       6, Level 2,  512 KB, Assoc   8, LineSize  64
---*------------  Unified Cache       7, Level 3,   16 MB, Assoc  16, LineSize  64
----*-----------  Data Cache          4, Level 1,   32 KB, Assoc   8, LineSize  64
----*-----------  Instruction Cache   4, Level 1,   64 KB, Assoc   4, LineSize  64
----*-----------  Unified Cache       8, Level 2,  512 KB, Assoc   8, LineSize  64
----*-----------  Unified Cache       9, Level 3,   16 MB, Assoc  16, LineSize  64
-----*----------  Data Cache          5, Level 1,   32 KB, Assoc   8, LineSize  64
-----*----------  Instruction Cache   5, Level 1,   64 KB, Assoc   4, LineSize  64
-----*----------  Unified Cache      10, Level 2,  512 KB, Assoc   8, LineSize  64
-----*----------  Unified Cache      11, Level 3,   16 MB, Assoc  16, LineSize  64
------*---------  Data Cache          6, Level 1,   32 KB, Assoc   8, LineSize  64
------*---------  Instruction Cache   6, Level 1,   64 KB, Assoc   4, LineSize  64
------*---------  Unified Cache      12, Level 2,  512 KB, Assoc   8, LineSize  64
------*---------  Unified Cache      13, Level 3,   16 MB, Assoc  16, LineSize  64
-------*--------  Data Cache          7, Level 1,   32 KB, Assoc   8, LineSize  64
-------*--------  Instruction Cache   7, Level 1,   64 KB, Assoc   4, LineSize  64
-------*--------  Unified Cache      14, Level 2,  512 KB, Assoc   8, LineSize  64
-------*--------  Unified Cache      15, Level 3,   16 MB, Assoc  16, LineSize  64
--------*-------  Data Cache          8, Level 1,   32 KB, Assoc   8, LineSize  64
--------*-------  Instruction Cache   8, Level 1,   64 KB, Assoc   4, LineSize  64
--------*-------  Unified Cache      16, Level 2,  512 KB, Assoc   8, LineSize  64
--------*-------  Unified Cache      17, Level 3,   16 MB, Assoc  16, LineSize  64
---------*------  Data Cache          9, Level 1,   32 KB, Assoc   8, LineSize  64
---------*------  Instruction Cache   9, Level 1,   64 KB, Assoc   4, LineSize  64
---------*------  Unified Cache      18, Level 2,  512 KB, Assoc   8, LineSize  64
---------*------  Unified Cache      19, Level 3,   16 MB, Assoc  16, LineSize  64
----------*-----  Data Cache         10, Level 1,   32 KB, Assoc   8, LineSize  64
----------*-----  Instruction Cache  10, Level 1,   64 KB, Assoc   4, LineSize  64
----------*-----  Unified Cache      20, Level 2,  512 KB, Assoc   8, LineSize  64
----------*-----  Unified Cache      21, Level 3,   16 MB, Assoc  16, LineSize  64
-----------*----  Data Cache         11, Level 1,   32 KB, Assoc   8, LineSize  64
-----------*----  Instruction Cache  11, Level 1,   64 KB, Assoc   4, LineSize  64
-----------*----  Unified Cache      22, Level 2,  512 KB, Assoc   8, LineSize  64
-----------*----  Unified Cache      23, Level 3,   16 MB, Assoc  16, LineSize  64
------------*---  Data Cache         12, Level 1,   32 KB, Assoc   8, LineSize  64
------------*---  Instruction Cache  12, Level 1,   64 KB, Assoc   4, LineSize  64
------------*---  Unified Cache      24, Level 2,  512 KB, Assoc   8, LineSize  64
------------*---  Unified Cache      25, Level 3,   16 MB, Assoc  16, LineSize  64
-------------*--  Data Cache         13, Level 1,   32 KB, Assoc   8, LineSize  64
-------------*--  Instruction Cache  13, Level 1,   64 KB, Assoc   4, LineSize  64
-------------*--  Unified Cache      26, Level 2,  512 KB, Assoc   8, LineSize  64
-------------*--  Unified Cache      27, Level 3,   16 MB, Assoc  16, LineSize  64
--------------*-  Data Cache         14, Level 1,   32 KB, Assoc   8, LineSize  64
--------------*-  Instruction Cache  14, Level 1,   64 KB, Assoc   4, LineSize  64
--------------*-  Unified Cache      28, Level 2,  512 KB, Assoc   8, LineSize  64
--------------*-  Unified Cache      29, Level 3,   16 MB, Assoc  16, LineSize  64
---------------*  Data Cache         15, Level 1,   32 KB, Assoc   8, LineSize  64
---------------*  Instruction Cache  15, Level 1,   64 KB, Assoc   4, LineSize  64
---------------*  Unified Cache      30, Level 2,  512 KB, Assoc   8, LineSize  64
---------------*  Unified Cache      31, Level 3,   16 MB, Assoc  16, LineSize  64

Logical Processor to Group Map:
****************  Group 0

Thanks. That looks interesting, insofar as the Cache to Logical Core grouping seems to be wrong ... This is how it looks on my i7 3770k:

Code:

Logical Processor to Cache Map:
**------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
**------  Instruction Cache   0, Level 1,   32 KB, Assoc   8, LineSize  64
**------  Unified Cache       0, Level 2,  256 KB, Assoc   8, LineSize  64
********  Unified Cache       1, Level 3,    8 MB, Assoc  16, LineSize  64
--**----  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
--**----  Instruction Cache   1, Level 1,   32 KB, Assoc   8, LineSize  64
--**----  Unified Cache       2, Level 2,  256 KB, Assoc   8, LineSize  64
----**--  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
----**--  Instruction Cache   2, Level 1,   32 KB, Assoc   8, LineSize  64
----**--  Unified Cache       3, Level 2,  256 KB, Assoc   8, LineSize  64
------**  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
------**  Instruction Cache   3, Level 1,   32 KB, Assoc   8, LineSize  64
------**  Unified Cache       4, Level 2,  256 KB, Assoc   8, LineSize  64

So code which takes this into account would assume each logical core has its own Cache, with a totally wrong Cache size. Also a NUMA group per CCX would make sense. This could explain some of the bad results.

Edit: This is what I think Ryzen results should look like:

Code:

Logical Processor to Cache Map:
**--------------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
**--------------  Instruction Cache   0, Level 1,   64 KB, Assoc   4, LineSize  64
**--------------  Unified Cache       0, Level 2,  512 KB, Assoc   8, LineSize  64
********--------  Unified Cache       1, Level 3,    8 MB, Assoc  16, LineSize  64
--**------------  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
--**------------  Instruction Cache   1, Level 1,   64 KB, Assoc   4, LineSize  64
--**------------  Unified Cache       2, Level 2,  512 KB, Assoc   8, LineSize  64
----**----------  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
----**----------  Instruction Cache   2, Level 1,   64 KB, Assoc   4, LineSize  64
----**----------  Unified Cache       3, Level 2,  512 KB, Assoc   8, LineSize  64
------**--------  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
------**--------  Instruction Cache   3, Level 1,   64 KB, Assoc   4, LineSize  64
------**--------  Unified Cache       4, Level 2,  512 KB, Assoc   8, LineSize  64
--------**------  Data Cache          5, Level 1,   32 KB, Assoc   8, LineSize  64
--------**------  Instruction Cache   5, Level 1,   64 KB, Assoc   4, LineSize  64
--------**------  Unified Cache       5, Level 2,  512 KB, Assoc   8, LineSize  64
--------********  Unified Cache       6, Level 3,    8 MB, Assoc  16, LineSize  64
----------**----  Data Cache          6, Level 1,   32 KB, Assoc   8, LineSize  64
----------**----  Instruction Cache   6, Level 1,   64 KB, Assoc   4, LineSize  64
----------**----  Unified Cache       7, Level 2,  512 KB, Assoc   8, LineSize  64
------------**--  Data Cache          7, Level 1,   32 KB, Assoc   8, LineSize  64
------------**--  Instruction Cache   7, Level 1,   64 KB, Assoc   4, LineSize  64
------------**--  Unified Cache       8, Level 2,  512 KB, Assoc   8, LineSize  64
--------------**  Data Cache          8, Level 1,   32 KB, Assoc   8, LineSize  64
--------------**  Instruction Cache   8, Level 1,   64 KB, Assoc   4, LineSize  64
--------------**  Unified Cache       9, Level 2,  512 KB, Assoc   8, LineSize  64

.vodka · Mar 2, 2017

Considering it's a fresh out of the oven platform, considering everything... it's quite a strong start, even with software used to Intel's architectures.

Considering lots of BIOS fixes and updates to be issued this year, and as software gets optimized for Zen's particular quirks here and there, I can easily see it improving in performance with time.

Big thanks for the results and all the information, Stilt, much appreciated.

Greyguy1948 · Mar 2, 2017

You have instruction latency and throughput here:
http://users.atw.hu/instlatx64/AuthenticAMD0800F00_K17_Zen_InstLatX64.txt
It looks like an Eng Sample and the clock is strange. I guess they will soon add Memory Latency.

The Stilt · Mar 2, 2017

Greyguy1948 said:
You have instruction latency and throughput here:
http://users.atw.hu/instlatx64/AuthenticAMD0800F00_K17_Zen_InstLatX64.txt
It looks like an Eng Sample and the clock is strange. I guess they will soon add Memory Latency.

That's on a ZP-A0 silicon, so better to ignore it.
There are some major changes in the production silicon.

Greyguy1948 · Mar 2, 2017

The Stilt said:
I'm done.

Very interesting info about SMT. Would this have any effect in games?
3D Euler is also tested at Techreport.co and this is not very good for Ryzen. SMT should be off maybe?

zir_blazer · Mar 2, 2017

Hi The Stilt, I just noticed you are still active, and, as usual, decided to ask you about some obscure features...

Do you know if AVIC (AMD counterpart to Intel APICv for APIC Virtualization, present in the HEDT platform since Ivy Bridge-E, but omitted entirely from consumer LGA 1155/1150/1151) is or will be present in all Ryzen parts? So far, thanks to a lscpu dump from the guy at Phoronix, I noticed that at least on the Ryzen 7 1800X the avic CPU Flag is present: http://openbenchmarking.org/system/1703021-RI-AMDZEN08075/Ryzen 7 1800X/lscpu

Other thing that I was interesed in is PCIe ACS (Access Control Services) support, both on Ryzen itself and the Chipset. It is a feature useful for PCI/VGA Passthrough on virtualized enviroments because it disables PCIe Peer-to-Peer data transfers and forces everything to go through the IOMMU, thus providing proper Device isolation (Else, its possible that due PCIe P2P they bypass it, which is not intended. Intel HEDT also has it, and again, its omitted on consumer Processors PCIe Controllers, although the consumer Chipsets do support it). Sadly, I have no idea how to specifically check support for ACS.
The usefulness of that would be to have an idea of how good the default IOMMU Grouping in Ryzen AM4 platform should be, for potential AM4 Passthrough users. More info here, if you're interesed: http://vfio.blogspot.com.ar/2014/08/iommu-groups-inside-and-out.html

If you can also get lspci -vvv and lspci -tv output from Linux in Ryzen, it would be even better, to know the platform in detail. There are also some minor features like FLR (Function Level Reset) on Chipset Devices which could also be useful to know if they're present or not. It would add a lot of flexibility if those features are supported.

The Stilt · Mar 2, 2017

MajinCry said:
Sweet. Optimal settings is to set Ships to 1, Rocks to 16000, and disable instancing.

CPU at fixed 3.6GHz speed.
R9 Nano w/ the settings you specified in your thread.

lolfail9001 · Mar 2, 2017

The Stilt said:
R9 Nano w/ the settings you specified in your thread.

Uhhhh

That result is a disaster.

MajinCry · Mar 2, 2017

The Stilt said:
CPU at fixed 3.6GHz speed.
R9 Nano w/ the settings you specified in your thread.

lolfail9001 said:
Uhhhh
That result is a disaster.

Well, that's that. Ryzen, at least from that result, has not rectified AMD's draw call deficit. Damn. I was hoping that they would have finally done it.

Edit: What driver version are you on?

The Stilt · Mar 2, 2017

bjt2 said:
The dLDO is present also in bristol ridge, according to an isscc paper. Moreover hiroshito goto posted, weeks ago some slides from an AMD presentation, were it was stated that dLDO was active. It can be easily proved if the dLDO is active in Ryzen: measure low load or idle consumption with and without OC mode enabled.

As I said in the OP, dLDOs for the main blocks (cores, caches, data fabric) are NOT used in consumer Zeppelin parts. ZP-A0c was the last version which had them enabled. They are permanently disabled with a fuse config on retail consumer (i.e. ZP-B1 parts).
The only reason Zeppelin features dLDOs in the first place is because it is a server design. It boosts the efficiency, makes binning easier and reduces BOM in MCM platforms. Without the integrated regulators you would either need four separate double plane VRMs (for Naples) and still you wouldn't be able to adjust the voltages for the different blocks individually.

If there are any integrated regulators in Carrizo or Bristol Ridge, they definitely are not for the main planes either. You can tell this by comparing the driven SVI data and the voltage calibration (command) values.

Here's the measured VDDCR_CPU power consumption in Normal & OC-Mode, with fixed voltage.
The VRM output voltage was locked to 1.3250V, while the actual voltage request in normal mode is 1.26250V for 3.6GHz (P0 Pstate).
If dLDOs are used in normal mode like you say, why isn't the power consumption significantly lower than in OC-Mode?

Activating OC-Mode WILL INCREASE the power consumption, however it has nothing to do with the dLDOs. The actual reason can be found in the OP.

dogen1 · Mar 2, 2017

I wonder what the result would be with an nvidia card.

MajinCry · Mar 2, 2017

dogen1 said:
I wonder what the result would be with an nvidia card.

Unfortunately, NVidia's driver is optimized for synthetic draw call benchmarks. If the same draw call is made for the entire scene (no lights, no shadows, no different objects), the driver essentially performs a very weak form of instancing.

Only kept them in the benchmark because why not.

dogen1 · Mar 2, 2017

MajinCry said:
Unfortunately, NVidia's driver is optimized for synthetic draw call benchmarks. If the same draw call is made for the entire scene (no lights, no shadows, no different objects), the driver essentially performs a very weak form of instancing.

Only kept them in the benchmark because why not.

I think it's more like the driver is able to take advantage of the situation. Their optimizations surely help in games where they're applicable.

MajinCry · Mar 2, 2017

dogen1 said:
I think it's more like the driver is able to take advantage of the situation. Their optimizations surely help in games where they're applicable.

The necessary situation for this "optimization" to come into play never occurs in games. As soon as a different draw call gets called, the performance gain quickly dissipates. Any game that has a variety of objects, any lights, any shadows, or any materials, etc, will not be able to take advantage of this.

It's purely an optimization for synthetic benches.

bjt2 · Mar 2, 2017

The Stilt said:
As I said in the OP, dLDOs for the main blocks (cores, caches, data fabric) are NOT used in consumer Zeppelin parts. ZP-A0c was the last version which had them enabled. They are permanently disabled with a fuse config on retail consumer (i.e. ZP-B1 parts).
The only reason Zeppelin features dLDOs in the first place is because it is a server design. It boosts the efficiency, makes binning easier and reduces BOM in MCM platforms. Without the integrated regulators you would either need four separate double plane VRMs (for Naples) and still you wouldn't be able to adjust the voltages for the different blocks individually.

If there are any integrated regulators in Carrizo or Bristol Ridge, they definitely are not for the main planes either. You can tell this by comparing the driven SVI data and the voltage calibration (command) values.

Here's the measured VDDCR_CPU power consumption in Normal & OC-Mode, with fixed voltage.
The VRM output voltage was locked to 1.3250V, while the actual voltage request in normal mode is 1.26250V for 3.6GHz (P0 Pstate).
If dLDOs are used in normal mode like you say, why isn't the power consumption significantly lower than in OC-Mode?

Activating OC-Mode WILL INCREASE the power consumption, however it has nothing to do with the dLDOs. The actual reason can be found in the OP.

Maybe it's a problem in early BIOSes. Arstecnica article (https://arstechnica.com/gadgets/201...n-finally-an-architecture-that-can-compete/3/) based on the official slides, mentions dLDO. If it was disabled on retail parts, they will have not mentioned, do you agree? Are your considerations based on a late ES or retail samples? beta BIOS or final BIOS?

dogen1 · Mar 2, 2017

MajinCry said:
The necessary situation for this "optimization" to come into play never occurs in games. As soon as a different draw call gets called, the performance gain quickly dissipates. Any game that has a variety of objects, any lights, any shadows, or any materials, etc, will not be able to take advantage of this.

It's purely an optimization for synthetic benches.

Maybe not in the vast majority of games, but I'd prefer my driver to "auto-optimize" games that don't do this type of batching by themselves.

Maybe it screws up benches, but I have to wonder why doesn't AMD have this type of functionality in their drivers.

del42sa · Mar 2, 2017

an interesting findings:

http://www.hardware.fr/articles/956-22/retour-sous-systeme-memoire.html

http://www.hardware.fr/articles/956-7/impact-smt-ht.html

MajinCry · Mar 2, 2017

dogen1 said:
Maybe not in the vast majority of games, but I'd prefer my driver to "auto-optimize" games that don't do this type of batching by themselves.

Maybe it screws up benches, but I have to wonder why doesn't AMD have this type of functionality in their drivers.

Because it's useless. If any other draw calls are made by the game, the optimization fails. I repeat, this will not have any use in games.

http://enbseries.enbdev.com/forum/viewtopic.php?f=17&t=4869#p69741

With dips things are not that simple. Not just draw function call cost lot of performance, but every command to driver between draw calls. You may (for old nvidia drivers at least) call crazy amount of draw calls in cycle without modifying anything, performance will be awesome. Insert any changes to object -> bottleneck.

For this to be used, the game must not have any lighting, any shadows, any shaders, any materials, any decals, and any other meshes.

You won't find this in games. Maybe if you go back to the 80s (i.e, Asteroids), but not even DOS era games would be applicable.

lolfail9001 · Mar 2, 2017

del42sa said:
http://www.hardware.fr/articles/956-22/retour-sous-systeme-memoire.html

Uhm, their inter-CCX bus only has 22GB/sec bandwidth?

Sorry, WHAT!?

The Stilt · Mar 2, 2017

arandomguy said:
I'm disappointed in what XFR turned out to be versus what I initially thought it could be if I'm understanding it correctly. It still basically behaves like existing CPU turbos in that there is essentially only 2 steps, ST and All core with no intermediates correct? As opposed to GPU turbos which actually have much more fine grain and dynamic clock speeds actually based on workload?

XFR is indeed basically a new name for the highest all core and single core boost states.
The only difference is that the frequency changes occur faster than ever before, and because of that the granularity is not as large as one could expect.

thepaleobiker · Mar 2, 2017

lolfail9001 said:
Is there a chance rumored latency issues have anything to do with it? Because i see such weird results on few synthetics that i only have memory in mind to explain that.

Potentially an optimizing issue given that older uArches (Intel's Core series and AMD's FX cores) are doing well , while the 1-day old Ryzen performs poorly..... Some softwares might have uArch specific optimizations.

Hopefully, in the next few weeks/months , we will see an improvement or atleast an answer to these questions. I've not had a chance to read the REDDIT AMA thread by Lisa

Ryzen: Strictly technical

Senior member

Senior member

Diamond Member

Golden Member

Member

Golden Member

Member

Golden Member

Member

Golden Member

Golden Member

Golden Member

Platinum Member

Golden Member

Senior member

Platinum Member

Senior member

Platinum Member

Senior member

Senior member

Member

Platinum Member

Golden Member

Golden Member

Member