Ryzen: Strictly technical

DrMrLordX · Mar 5, 2017

The Stilt said:
14nm LPP is perfectly fine (and potentially even the optimal choice) for low clocked, high core count server parts (such as Naples).
The consumer parts definitely need a higher Fmax capability and most likely 16nm FF+ could provide that.

The issue with the newer Samsung process variant is that they are currently not available, and are not proven in practice like the 16nm FF+ is.

How about GF 14nm HP? That's what IBM is using for POWER9, and it is estimated to launch in the 4 GHz range. Later chips will probably hit 4.5 GHz if POWER8 is any indicator.

deadhand · Mar 5, 2017

If anyone wants to know what dual e5-2680's (8 core / 16 thread SandyBridge-EP chips) look like in Core Info (v3.31) on Windows 10 64-bit, in comparison to an R7-1800x, here you go:

EDIT: Put it in code tags since it uses a mono-space font.

Dual e5-2680's: (total 16 cores, 32 threads combined)

Code:

Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz
Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
Microcode signature: 00000710
HTT           *    Hyperthreading enabled
HYPERVISOR    -    Hypervisor is present
VMX           *    Supports Intel hardware-assisted virtualization
SVM           -    Supports AMD hardware-assisted virtualization
X64           *    Supports 64-bit mode

SMX           *    Supports Intel trusted execution
SKINIT        -    Supports AMD SKINIT

NX            *    Supports no-execute page protection
SMEP          -    Supports Supervisor Mode Execution Prevention
SMAP          -    Supports Supervisor Mode Access Prevention
PAGE1GB       *    Supports 1 GB large pages
PAE           *    Supports > 32-bit physical addresses
PAT           *    Supports Page Attribute Table
PSE           *    Supports 4 MB pages
PSE36         *    Supports > 32-bit address 4 MB pages
PGE           *    Supports global bit in page tables
SS            *    Supports bus snooping for cache operations
VME           *    Supports Virtual-8086 mode
RDWRFSGSBASE    -    Supports direct GS/FS base access

FPU           *    Implements i387 floating point instructions
MMX           *    Supports MMX instruction set
MMXEXT        -    Implements AMD MMX extensions
3DNOW         -    Supports 3DNow! instructions
3DNOWEXT      -    Supports 3DNow! extension instructions
SSE           *    Supports Streaming SIMD Extensions
SSE2          *    Supports Streaming SIMD Extensions 2
SSE3          *    Supports Streaming SIMD Extensions 3
SSSE3         *    Supports Supplemental SIMD Extensions 3
SSE4a         -    Supports Streaming SIMDR Extensions 4a
SSE4.1        *    Supports Streaming SIMD Extensions 4.1
SSE4.2        *    Supports Streaming SIMD Extensions 4.2

AES           *    Supports AES extensions
AVX           *    Supports AVX intruction extensions
FMA           -    Supports FMA extensions using YMM state
MSR           *    Implements RDMSR/WRMSR instructions
MTRR          *    Supports Memory Type Range Registers
XSAVE         *    Supports XSAVE/XRSTOR instructions
OSXSAVE       *    Supports XSETBV/XGETBV instructions
RDRAND        -    Supports RDRAND instruction
RDSEED        -    Supports RDSEED instruction

CMOV          *    Supports CMOVcc instruction
CLFSH         *    Supports CLFLUSH instruction
CX8           *    Supports compare and exchange 8-byte instructions
CX16          *    Supports CMPXCHG16B instruction
BMI1          -    Supports bit manipulation extensions 1
BMI2          -    Supports bit manipulation extensions 2
ADX           -    Supports ADCX/ADOX instructions
DCA           *    Supports prefetch from memory-mapped device
F16C          -    Supports half-precision instruction
FXSR          *    Supports FXSAVE/FXSTOR instructions
FFXSR         -    Supports optimized FXSAVE/FSRSTOR instruction
MONITOR       *    Supports MONITOR and MWAIT instructions
MOVBE         -    Supports MOVBE instruction
ERMSB         -    Supports Enhanced REP MOVSB/STOSB
PCLMULDQ      *    Supports PCLMULDQ instruction
POPCNT        *    Supports POPCNT instruction
LZCNT         -    Supports LZCNT instruction
SEP           *    Supports fast system call instructions
LAHF-SAHF     *    Supports LAHF/SAHF instructions in 64-bit mode
HLE           -    Supports Hardware Lock Elision instructions
RTM           -    Supports Restricted Transactional Memory instructions

DE            *    Supports I/O breakpoints including CR4.DE
DTES64        *    Can write history of 64-bit branch addresses
DS            *    Implements memory-resident debug buffer
DS-CPL        *    Supports Debug Store feature with CPL
PCID          *    Supports PCIDs and settable CR4.PCIDE
INVPCID       -    Supports INVPCID instruction
PDCM          *    Supports Performance Capabilities MSR
RDTSCP        *    Supports RDTSCP instruction
TSC           *    Supports RDTSC instruction
TSC-DEADLINE    *    Local APIC supports one-shot deadline timer
TSC-INVARIANT    *    TSC runs at constant rate
xTPR          *    Supports disabling task priority messages

EIST          *    Supports Enhanced Intel Speedstep
ACPI          *    Implements MSR for power management
TM            *    Implements thermal monitor circuitry
TM2           *    Implements Thermal Monitor 2 control
APIC          *    Implements software-accessible local APIC
x2APIC        *    Supports x2APIC

CNXT-ID       -    L1 data cache mode adaptive or BIOS

MCE           *    Supports Machine Check, INT18 and CR4.MCE
MCA           *    Implements Machine Check Architecture
PBE           *    Supports use of FERR#/PBE# pin

PSN           -    Implements 96-bit processor serial number

PREFETCHW     *    Supports PREFETCHW instruction

Maximum implemented CPUID leaves: 0000000D (Basic), 80000008 (Extended).

Logical to Physical Processor Map:
**------------------------------  Physical Processor 0 (Hyperthreaded)
--**----------------------------  Physical Processor 1 (Hyperthreaded)
----**--------------------------  Physical Processor 2 (Hyperthreaded)
------**------------------------  Physical Processor 3 (Hyperthreaded)
--------**----------------------  Physical Processor 4 (Hyperthreaded)
----------**--------------------  Physical Processor 5 (Hyperthreaded)
------------**------------------  Physical Processor 6 (Hyperthreaded)
--------------**----------------  Physical Processor 7 (Hyperthreaded)
----------------**--------------  Physical Processor 8 (Hyperthreaded)
------------------**------------  Physical Processor 9 (Hyperthreaded)
--------------------**----------  Physical Processor 10 (Hyperthreaded)
----------------------**--------  Physical Processor 11 (Hyperthreaded)
------------------------**------  Physical Processor 12 (Hyperthreaded)
--------------------------**----  Physical Processor 13 (Hyperthreaded)
----------------------------**--  Physical Processor 14 (Hyperthreaded)
------------------------------**  Physical Processor 15 (Hyperthreaded)

Logical Processor to Socket Map:
****************----------------  Socket 0
----------------****************  Socket 1

Logical Processor to NUMA Node Map:
****************----------------  NUMA Node 0
----------------****************  NUMA Node 1
Calculating Cross-NUMA Node Access Cost...
                             
Approximate Cross-NUMA Node Access Cost (relative to fastest):
     00  01
00: 1.1 1.6
01: 1.4 1.0

Logical Processor to Cache Map:
**------------------------------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
**------------------------------  Instruction Cache   0, Level 1,   32 KB, Assoc   8, LineSize  64
**------------------------------  Unified Cache       0, Level 2,  256 KB, Assoc   8, LineSize  64
****************----------------  Unified Cache       1, Level 3,   20 MB, Assoc  20, LineSize  64
--**----------------------------  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
--**----------------------------  Instruction Cache   1, Level 1,   32 KB, Assoc   8, LineSize  64
--**----------------------------  Unified Cache       2, Level 2,  256 KB, Assoc   8, LineSize  64
----**--------------------------  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
----**--------------------------  Instruction Cache   2, Level 1,   32 KB, Assoc   8, LineSize  64
----**--------------------------  Unified Cache       3, Level 2,  256 KB, Assoc   8, LineSize  64
------**------------------------  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
------**------------------------  Instruction Cache   3, Level 1,   32 KB, Assoc   8, LineSize  64
------**------------------------  Unified Cache       4, Level 2,  256 KB, Assoc   8, LineSize  64
--------**----------------------  Data Cache          4, Level 1,   32 KB, Assoc   8, LineSize  64
--------**----------------------  Instruction Cache   4, Level 1,   32 KB, Assoc   8, LineSize  64
--------**----------------------  Unified Cache       5, Level 2,  256 KB, Assoc   8, LineSize  64
----------**--------------------  Data Cache          5, Level 1,   32 KB, Assoc   8, LineSize  64
----------**--------------------  Instruction Cache   5, Level 1,   32 KB, Assoc   8, LineSize  64
----------**--------------------  Unified Cache       6, Level 2,  256 KB, Assoc   8, LineSize  64
------------**------------------  Data Cache          6, Level 1,   32 KB, Assoc   8, LineSize  64
------------**------------------  Instruction Cache   6, Level 1,   32 KB, Assoc   8, LineSize  64
------------**------------------  Unified Cache       7, Level 2,  256 KB, Assoc   8, LineSize  64
--------------**----------------  Data Cache          7, Level 1,   32 KB, Assoc   8, LineSize  64
--------------**----------------  Instruction Cache   7, Level 1,   32 KB, Assoc   8, LineSize  64
--------------**----------------  Unified Cache       8, Level 2,  256 KB, Assoc   8, LineSize  64
----------------**--------------  Data Cache          8, Level 1,   32 KB, Assoc   8, LineSize  64
----------------**--------------  Instruction Cache   8, Level 1,   32 KB, Assoc   8, LineSize  64
----------------**--------------  Unified Cache       9, Level 2,  256 KB, Assoc   8, LineSize  64
----------------****************  Unified Cache      10, Level 3,   20 MB, Assoc  20, LineSize  64
------------------**------------  Data Cache          9, Level 1,   32 KB, Assoc   8, LineSize  64
------------------**------------  Instruction Cache   9, Level 1,   32 KB, Assoc   8, LineSize  64
------------------**------------  Unified Cache      11, Level 2,  256 KB, Assoc   8, LineSize  64
--------------------**----------  Data Cache         10, Level 1,   32 KB, Assoc   8, LineSize  64
--------------------**----------  Instruction Cache  10, Level 1,   32 KB, Assoc   8, LineSize  64
--------------------**----------  Unified Cache      12, Level 2,  256 KB, Assoc   8, LineSize  64
----------------------**--------  Data Cache         11, Level 1,   32 KB, Assoc   8, LineSize  64
----------------------**--------  Instruction Cache  11, Level 1,   32 KB, Assoc   8, LineSize  64
----------------------**--------  Unified Cache      13, Level 2,  256 KB, Assoc   8, LineSize  64
------------------------**------  Data Cache         12, Level 1,   32 KB, Assoc   8, LineSize  64
------------------------**------  Instruction Cache  12, Level 1,   32 KB, Assoc   8, LineSize  64
------------------------**------  Unified Cache      14, Level 2,  256 KB, Assoc   8, LineSize  64
--------------------------**----  Data Cache         13, Level 1,   32 KB, Assoc   8, LineSize  64
--------------------------**----  Instruction Cache  13, Level 1,   32 KB, Assoc   8, LineSize  64
--------------------------**----  Unified Cache      15, Level 2,  256 KB, Assoc   8, LineSize  64
----------------------------**--  Data Cache         14, Level 1,   32 KB, Assoc   8, LineSize  64
----------------------------**--  Instruction Cache  14, Level 1,   32 KB, Assoc   8, LineSize  64
----------------------------**--  Unified Cache      16, Level 2,  256 KB, Assoc   8, LineSize  64
------------------------------**  Data Cache         15, Level 1,   32 KB, Assoc   8, LineSize  64
------------------------------**  Instruction Cache  15, Level 1,   32 KB, Assoc   8, LineSize  64
------------------------------**  Unified Cache      17, Level 2,  256 KB, Assoc   8, LineSize  64

Logical Processor to Group Map:
********************************  Group 0

EDIT:

The 1800x used in my testing on page 9
(found here: https://forums.anandtech.com/threads/ryzen-strictly-technical.2500572/page-9#post-38776310)
has these results:

(Note that the two L3 regions seem to be mapped correctly - on my dual e5-2680's, all threads on a given CPU have equal access to the entire L3 cache on that CPU, where-as on Ryzen the L3 split is within the CPU, as would be expected given the CCX topology. A bit numa-like.)

R7-1800x (8 cores / 16 threads)

Code:

AMD Ryzen 7 1800X Eight-Core Processor
AMD64 Family 23 Model 1 Stepping 1, AuthenticAMD
Microcode signature: 08001105
HTT           *    Multicore
HYPERVISOR    -    Hypervisor is present
VMX           -    Supports Intel hardware-assisted virtualization
SVM           *    Supports AMD hardware-assisted virtualization
X64           *    Supports 64-bit mode

SMX           -    Supports Intel trusted execution
SKINIT        *    Supports AMD SKINIT

NX            *    Supports no-execute page protection
SMEP          *    Supports Supervisor Mode Execution Prevention
SMAP          *    Supports Supervisor Mode Access Prevention
PAGE1GB       *    Supports 1 GB large pages
PAE           *    Supports > 32-bit physical addresses
PAT           *    Supports Page Attribute Table
PSE           *    Supports 4 MB pages
PSE36         *    Supports > 32-bit address 4 MB pages
PGE           *    Supports global bit in page tables
SS            -    Supports bus snooping for cache operations
VME           *    Supports Virtual-8086 mode
RDWRFSGSBASE    *    Supports direct GS/FS base access

FPU           *    Implements i387 floating point instructions
MMX           *    Supports MMX instruction set
MMXEXT        *    Implements AMD MMX extensions
3DNOW         -    Supports 3DNow! instructions
3DNOWEXT      -    Supports 3DNow! extension instructions
SSE           *    Supports Streaming SIMD Extensions
SSE2          *    Supports Streaming SIMD Extensions 2
SSE3          *    Supports Streaming SIMD Extensions 3
SSSE3         *    Supports Supplemental SIMD Extensions 3
SSE4a         *    Supports Streaming SIMDR Extensions 4a
SSE4.1        *    Supports Streaming SIMD Extensions 4.1
SSE4.2        *    Supports Streaming SIMD Extensions 4.2

AES           *    Supports AES extensions
AVX           *    Supports AVX intruction extensions
FMA           *    Supports FMA extensions using YMM state
MSR           *    Implements RDMSR/WRMSR instructions
MTRR          *    Supports Memory Type Range Registers
XSAVE         *    Supports XSAVE/XRSTOR instructions
OSXSAVE       *    Supports XSETBV/XGETBV instructions
RDRAND        *    Supports RDRAND instruction
RDSEED        *    Supports RDSEED instruction

CMOV          *    Supports CMOVcc instruction
CLFSH         *    Supports CLFLUSH instruction
CX8           *    Supports compare and exchange 8-byte instructions
CX16          *    Supports CMPXCHG16B instruction
BMI1          *    Supports bit manipulation extensions 1
BMI2          *    Supports bit manipulation extensions 2
ADX           *    Supports ADCX/ADOX instructions
DCA           -    Supports prefetch from memory-mapped device
F16C          *    Supports half-precision instruction
FXSR          *    Supports FXSAVE/FXSTOR instructions
FFXSR         *    Supports optimized FXSAVE/FSRSTOR instruction
MONITOR       *    Supports MONITOR and MWAIT instructions
MOVBE         *    Supports MOVBE instruction
ERMSB         -    Supports Enhanced REP MOVSB/STOSB
PCLMULDQ      *    Supports PCLMULDQ instruction
POPCNT        *    Supports POPCNT instruction
LZCNT         *    Supports LZCNT instruction
SEP           *    Supports fast system call instructions
LAHF-SAHF     *    Supports LAHF/SAHF instructions in 64-bit mode
HLE           -    Supports Hardware Lock Elision instructions
RTM           -    Supports Restricted Transactional Memory instructions

DE            *    Supports I/O breakpoints including CR4.DE
DTES64        -    Can write history of 64-bit branch addresses
DS            -    Implements memory-resident debug buffer
DS-CPL        -    Supports Debug Store feature with CPL
PCID          -    Supports PCIDs and settable CR4.PCIDE
INVPCID       -    Supports INVPCID instruction
PDCM          -    Supports Performance Capabilities MSR
RDTSCP        *    Supports RDTSCP instruction
TSC           *    Supports RDTSC instruction
TSC-DEADLINE    -    Local APIC supports one-shot deadline timer
TSC-INVARIANT    *    TSC runs at constant rate
xTPR          -    Supports disabling task priority messages

EIST          -    Supports Enhanced Intel Speedstep
ACPI          -    Implements MSR for power management
TM            -    Implements thermal monitor circuitry
TM2           -    Implements Thermal Monitor 2 control
APIC          *    Implements software-accessible local APIC
x2APIC        -    Supports x2APIC

CNXT-ID       -    L1 data cache mode adaptive or BIOS

MCE           *    Supports Machine Check, INT18 and CR4.MCE
MCA           *    Implements Machine Check Architecture
PBE           -    Supports use of FERR#/PBE# pin

PSN           -    Implements 96-bit processor serial number

PREFETCHW     *    Supports PREFETCHW instruction

Maximum implemented CPUID leaves: 0000000D (Basic), 8000001F (Extended).

Logical to Physical Processor Map:
**--------------  Physical Processor 0 (Hyperthreaded)
--**------------  Physical Processor 1 (Hyperthreaded)
----**----------  Physical Processor 2 (Hyperthreaded)
------**--------  Physical Processor 3 (Hyperthreaded)
--------**------  Physical Processor 4 (Hyperthreaded)
----------**----  Physical Processor 5 (Hyperthreaded)
------------**--  Physical Processor 6 (Hyperthreaded)
--------------**  Physical Processor 7 (Hyperthreaded)

Logical Processor to Socket Map:
****************  Socket 0

Logical Processor to NUMA Node Map:
****************  NUMA Node 0

No NUMA nodes.

Logical Processor to Cache Map:
**--------------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
**--------------  Instruction Cache   0, Level 1,   64 KB, Assoc   4, LineSize  64
**--------------  Unified Cache       0, Level 2,  512 KB, Assoc   8, LineSize  64
********--------  Unified Cache       1, Level 3,    8 MB, Assoc  16, LineSize  64
--**------------  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
--**------------  Instruction Cache   1, Level 1,   64 KB, Assoc   4, LineSize  64
--**------------  Unified Cache       2, Level 2,  512 KB, Assoc   8, LineSize  64
----**----------  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
----**----------  Instruction Cache   2, Level 1,   64 KB, Assoc   4, LineSize  64
----**----------  Unified Cache       3, Level 2,  512 KB, Assoc   8, LineSize  64
------**--------  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
------**--------  Instruction Cache   3, Level 1,   64 KB, Assoc   4, LineSize  64
------**--------  Unified Cache       4, Level 2,  512 KB, Assoc   8, LineSize  64
--------**------  Data Cache          4, Level 1,   32 KB, Assoc   8, LineSize  64
--------**------  Instruction Cache   4, Level 1,   64 KB, Assoc   4, LineSize  64
--------**------  Unified Cache       5, Level 2,  512 KB, Assoc   8, LineSize  64
--------********  Unified Cache       6, Level 3,    8 MB, Assoc  16, LineSize  64
----------**----  Data Cache          5, Level 1,   32 KB, Assoc   8, LineSize  64
----------**----  Instruction Cache   5, Level 1,   64 KB, Assoc   4, LineSize  64
----------**----  Unified Cache       7, Level 2,  512 KB, Assoc   8, LineSize  64
------------**--  Data Cache          6, Level 1,   32 KB, Assoc   8, LineSize  64
------------**--  Instruction Cache   6, Level 1,   64 KB, Assoc   4, LineSize  64
------------**--  Unified Cache       8, Level 2,  512 KB, Assoc   8, LineSize  64
--------------**  Data Cache          7, Level 1,   32 KB, Assoc   8, LineSize  64
--------------**  Instruction Cache   7, Level 1,   64 KB, Assoc   4, LineSize  64
--------------**  Unified Cache       9, Level 2,  512 KB, Assoc   8, LineSize  64

Logical Processor to Group Map:
****************  Group 0

zir_blazer · Mar 5, 2017

Patrick from ServeTheHome provided IOMMU Grouping in his ASUS PRIME B350-PLUS. I made this new image (A composite of lspci, lspci -t and IOMMU Groups):

http://imgur.com/a/36pAT

I noticed too late for the previous image that I forgot than the Motherboard had a PCIe 4x Slot (The second 16x). The 4 PCIe Lanes actually belonged to it. Thus, what wasn't present were the PCI Bridges for the other two PCIe 1x Slots. Since there are 8 PCIe Lanes in total and B370 didn't had that many (Only 6), I suppose than the other 2 are repurposed from the B350 SATA Express (Which should be 2 PCIe Lanes, thus you get to 8, and X370 should have 10 by the same metric).
Also, notice that now the GeForce was moved from the main PCIe 16x Slot to the other 16x with the 4 PCIe 2.0 Lanes from the Chipset. There are also two new cards, an Intel XL710 NIC that moved in to the main PCIe 16x replacing the GeForce, while there is also an Intel 82572EI NIC in a PCIe 1x Slot.

The current IOMMU Grouping isn't THAT bad, actually, each Host Bridge/PCI Bridge pairing has its own IOMMU Group, just that it doesn't isolate what is below them, which could be a fixeable quirk. Assuming that bifurcation to 8x/8x in X370 creates a second PCI Bridge in IOMMU Group 3 as I expect it, you can already do Passthrough of two Video Cards to two different VMs without the ACS override patch, and have a third one for the host in the ugly Chipset group. The problem of that arrangement is that exactly because the IOMMU Groups of everything else is so ugly, you can't pick either of the Processor or Chipset USB or SATA Controllers exclusively for the other VM. I mean, Ryzen 7 1700 is THIS close from being a perfect candidate to build a two person multiseat setup.

Only thing missing now is for someone to actually try if Passthrough works.

PPB · Mar 5, 2017

DrMrLordX said:
How about GF 14nm HP? That's what IBM is using for POWER9, and it is estimated to launch in the 4 GHz range. Later chips will probably hit 4.5 GHz if POWER8 is any indicator.

It will require some sort of Porting. Remember LPE/LPP are Samsung copycat's with slight adaptations to GF's ecosystem of tools. I hope they stay LPP for RR. They don't need really high clocks, and that product needs to get out of the fab ASAP if they want to capitalize the momentum they have with Ryzen and possibly Vega uarchs.

JimmiG · Mar 5, 2017

The Stilt said:
I suggest you check the frequencies using the newest HWInfo rather than with CPU-Z.
Also Prime95 is currently not working properly on Ryzen, so I suggest you try with another workload.

But yeah, generally during a true single core workload you should be able to sustain 4.0 - 4.1GHz on 1800X.

Figured it out. You have to use "Balanced" power saving mode in Windows. XFR will only activate if enough cores (I'm guessing 6?) are parked, and High Performance disables core parking.
With Balanced, I'm now getting 4.1 GHz fairly reliably in single-threaded workloads (it's not 100% constant due to threads jumping between cores, and also Windows randomly moving background processes to parked cores, waking them up). My single threaded Cinebench and CPU-Z scores also rose by 6-8%, confirming that Boost/XFR works properly now.

.vodka · Mar 5, 2017

JimmiG said:
Figured it out. You have to use "Balanced" power saving mode in Windows. XFR will only activate if enough cores (I'm guessing 6?) are parked, and High Performance disables core parking.
With Balanced, I'm now getting 4.1 GHz fairly reliably in single-threaded workloads (it's not 100% constant due to threads jumping between cores, and also Windows randomly moving background processes to parked cores, waking them up). My single threaded Cinebench and CPU-Z scores also rose by 6-8%, confirming that Boost/XFR works properly now.

Yet "balanced" screws with SenseMI and all the hardware monitoring/adjusting going on. That's why "high performance" power profile is recommended by AMD.

They also recommend to disable HPET, but you need HPET to use Ryzen Master.

W7 performs better than W10 at the moment, too.

It's all a bit of a contradiction at the moment...

looncraz · Mar 5, 2017

.vodka said:
Yet "balanced" screws with SenseMI and all the hardware monitoring/adjusting going on. That's why "high performance" power profile is recommended by AMD.

They also recommend to disable HPET, but you need HPET to use Ryzen Master.

W7 performs better than W10 at the moment, too.

It's all a bit of a contradiction at the moment...

Not sure if you understood what the post said... "Balanced" was required to get single-core turbo to work... "High Performance" was causing the problem.

.vodka · Mar 5, 2017

I do. That's why I'm pointing that out

Some things perform better with high performance mode, others requiring ST performance should use balanced

There's no "one size fits all" solution at the moment.

PPB · Mar 5, 2017

It's contradictory because High performance is recommended by AMD too. But at the same time screws max turbo. Core Parking is hurting AMD performance but at the same time COre Parking enables max turbo, so it's a lose-lose situation. Hopefully AMD gets it fixed with MS soon.

dfk7677 · Mar 5, 2017

@The Stilt I noticed that the ST results for 7zip are higher than the respective ones for 4C/4T & 4C/8T for all the processors. Is there a typo?

The Stilt · Mar 5, 2017

dfk7677 said:
@The Stilt I noticed that the ST results for 7zip are higher than the respective ones for 4C/4T & 4C/8T for all the processors. Is there a typo?

No typo, just a different setting used (Fast vs. Normal) for MT to make the time spent for the performance evaluation sane. Same for WinRar.

DrMrLordX · Mar 5, 2017

PPB said:
It will require some sort of Porting. Remember LPE/LPP are Samsung copycat's with slight adaptations to GF's ecosystem of tools. I hope they stay LPP for RR. They don't need really high clocks, and that product needs to get out of the fab ASAP if they want to capitalize the momentum they have with Ryzen and possibly Vega uarchs.

They might go to HP or something else entirely for Zen+ though. Depends on how things pan out at GF.

JDG1980 · Mar 5, 2017

The Stilt said:
14nm LPP is perfectly fine (and potentially even the optimal choice) for low clocked, high core count server parts (such as Naples).
The consumer parts definitely need a higher Fmax capability and most likely 16nm FF+ could provide that.

Can AMD justify the cost? As you note, 14LPP will be very effective for server parts. And based on the charts and figures you posted, it should also work very well for laptops (the 35W performance of Ryzen is quite impressive). Laptops make up the bulk of the x86 market, and servers have high margins and should offer the best profits. What's left? Low-end OEM desktops don't care, they can just use laptop parts since they aren't offering cutting-edge performance anyway. That pretty much leaves enthusiast gaming and heavy-duty workstations as the only systems where higher frequency is really important. Are enthusiasts enough of a market to justify porting to a whole different process?

MongGrel · Mar 5, 2017

cytg111 said:
I dont know, anandtechs, ars, your own site perhaps? I understand that politics would follow and I do enjoy the objectivity of this review, so theres that...
I am just saying I think its good and I think you could get paid.. thats all.

Excellent review and thread OP.

And I have no clue what cytg111 is talking about.

Have been using the fast track on Windows10 awhile myself now, would be interesting to seem what some of the Red Stone things might do with it, installed the latest update today.

I'd imagine there will be more WIN10 adjustments for it in the near future.

Ajay · Mar 5, 2017

If Raven Ridge was going to be TSMC 16FF+ , then AMD would have already designed it with that PDK, so there would be no delay in porting it. Certainly RR has already taped out and I haven't read anything about it and all my searches came up empty. Using a new process is certainly a quick way to get a ~10% boost, but then there is the WSA that AMD has to contend with (long live Hector "Ruins" Ruiz).

DrMrLordX · Mar 5, 2017

Ajay said:
Using a new process is certainly a quick way to get a ~10% boost, but then there is the WSA that AMD has to contend with (long live Hector "Ruins" Ruiz).

AMD is already pushing GF's 14nm capacity to its limits. They are still making Polaris, presumably Vega, Summit Ridge, and Zeppelin with the process. The latest incarnation of the WSA gives AMD more flexibility when it comes to production with other foundries.

Martin Schou · Mar 5, 2017

I'm sorry, but where's the rest of this graph? I picked this one, but I could have picked any of at least a dozen others that suffer from the same problem as this one does.

By cutting out 85% of this graph, you've made Haswell and Kaby Lake look like they have at least twice the performance of Zen. In other graphs it's a different story with Zen apparently being massively better than the others. Stop it. Stop misleading and deceiving people with these misleading graphics.

It's not even consistent. In some graphs you have included the entire thing. For example

But at a glance (and first impressions matter) the first graph shows a far more impressive performance lead than the second one does, even though the second one has an 52% margin!

Stop it. Stop being deceitful, stop being misleading.

rvborgh · Mar 5, 2017

There is... its called Process Lasso... you can set the power profile to use with the application/game, tie thread affinities... etc. i hope one of the Ryzen folks give it a try pronto.

.vodka said:
I do. That's why I'm pointing that out

Some things perform better with high performance mode, others requiring ST performance should use balanced

There's no "one size fits all" solution at the moment.

HurleyBird · Mar 5, 2017

Martin Schou said:
But at a glance (and first impressions matter) the first graph shows a far more impressive performance lead than the second one does, even though the second one has an 52% margin!

As much as I appreciate the excellent work Stilt has done, I have to second this. Some of the scales were all over the place and made certain benchmarks difficult to decipher. If I hadn't taken the time to go over things thoroughly I would have been left with the wrong impression on several occasions. All scales should start at 0.

RampantAndroid · Mar 5, 2017

The Stilt said:
I did some 3D testing and eventhou there is not nearly enough data to confirm it, I'd say the SMT regression is infact a Windows 10 related issue.
In 3D testing I did recently on Windows 10, the title which illustrated the biggest SMT regression was Total War: Warhammer.

All of these were recorded at 3.5GHz, 2133MHz MEMCLK with R9 Nano:

Windows 10 - 1080 Ultra DX11:

8C/16T - 49.39fps (Min), 72.36fps (Avg)
8C/8T - 57.16fps (Min), 72.46fps (Avg)

Windows 7 - 1080 Ultra DX11:

8C/16T - 62.33fps (Min), 78.18fps (Avg)
8C/8T - 62.00fps (Min), 73.22fps (Avg)

At the moment this is just pure speculation as there were variables, which could not be isolated.
Windows 10 figures were recorded using PresentMon (OCAT), however with Windows 7 it was necessary to use Fraps.

What Win10 build were you on? Can you re-test this with the latest insider build that they've released?

tamz_msc · Mar 5, 2017

Martin Schou said:
I'm sorry, but where's the rest of this graph? I picked this one, but I could have picked any of at least a dozen others that suffer from the same problem as this one does.

By cutting out 85% of this graph, you've made Haswell and Kaby Lake look like they have at least twice the performance of Zen. In other graphs it's a different story with Zen apparently being massively better than the others. Stop it. Stop misleading and deceiving people with these misleading graphics.

It's not even consistent. In some graphs you have included the entire thing. For example

But at a glance (and first impressions matter) the first graph shows a far more impressive performance lead than the second one does, even though the second one has an 52% margin!

Stop it. Stop being deceitful, stop being misleading.

I agree with what you're saying but to claim it deceitful and misleading is exaggerating. When you have so many data points to go over, one ought to take time in looking at the data - after all, the title of the thread is "Strictly Technical".

Rngwn · Mar 5, 2017

Martin Schou said:
*snip*

It's not even consistent. In some graphs you have included the entire thing. For example

But at a glance (and first impressions matter) the first graph shows a far more impressive performance lead than the second one does, even though the second one has an 52% margin!

Stop it. Stop being deceitful, stop being misleading.

52% margin?, that's excavator vs the Kaby and not the Zen. In this graph the Kaby only have 152.25/132.03 = 15.3% margin.

Martin Schou · Mar 6, 2017

Rngwn said:
52% margin?, that's excavator vs the Kaby and not the Zen. In this graph the Kaby only have 152.25/132.03 = 15.3% margin.

My issue isn't with Zen vs other CPUs, it's what the graphs show, and that graph shows a 52% margin yet has a smaller apparent margin that the other graph I showed.

Ajay · Mar 6, 2017

DrMrLordX said:
AMD is already pushing GF's 14nm capacity to its limits. They are still making Polaris, presumably Vega, Summit Ridge, and Zeppelin with the process. The latest incarnation of the WSA gives AMD more flexibility when it comes to production with other foundries.

True and when RR comes online, it would become much more serious problem (larger TAM for RR). So, I wonder what might show up at Samsung, or TSMC.
Samsung LPU is too early as is GFL 14HP (and I suspect 14HP is ramping for low volume).

Martin Schou · Mar 6, 2017

tamz_msc said:
I agree with what you're saying but to claim it deceitful and misleading is exaggerating. When you have so many data points to go over, one ought to take time in looking at the data - after all, the title of the thread is "Strictly Technical".

But it is deceitful and misleading. It may not be the intention, but that doesn't make it any less so. And the argument "one ought to take the time" can just as easily be applied to any scam - if you don't take the time, you deserve to be scammed.

I'm not saying that Stilt is trying to slant his article in one direction or another, and I'm not saying that he's trying to deceive and mislead the readers, but that does not make the graphs any less deceitful and misleading, and given the amount of effort Stilt clearly put into this, why go out of your way to undermine your credibility by using graphs that exaggerate data (which is deceitful and misleading)?

Ryzen: Strictly technical

Lifer

Junior Member

Golden Member

Golden Member

Platinum Member

Golden Member

Senior member

Golden Member

Golden Member

Member

Golden Member

Lifer

Golden Member

Lifer

Lifer

Lifer

Junior Member

Member

Platinum Member

Diamond Member

Diamond Member

Member

Junior Member

Lifer

Junior Member