Ryzen: Strictly technical

Despoiler · Mar 27, 2017

SpecChum said:
True, but I've already tried up to 1.2v with the same result.

I didn't really want to go any higher.

Change your VTT DDR from auto to 0.675v.

SpecChum · Mar 27, 2017

Despoiler said:
Change your VTT DDR from auto to 0.675v.

Worth a go, I had assumed auto would set this anyway, as it's 1/2 of 1.35v as per spec.

Thanks

Despoiler · Mar 27, 2017

SpecChum said:
Worth a go, I had assumed auto would set this anyway, as it's 1/2 of 1.35v as per spec.

Thanks

From what I have read motherboard auto settings are not working correctly in this respect.

SpecChum · Mar 27, 2017

Despoiler said:
From what I have read motherboard auto settings are not working correctly in this respect.

Well, good news. Of sorts.

Setting VTT_DDR didn't have any effect BUT raising the DDR startup voltage from 1.37 to 1.43 (!) did. It now reboots 3 times and starts up in 3200C14. I'm assuming each F9 reboot it trains with slightly looser subtimings every time?

However, the good news we can take away from this is it's clearly not a Data Fabric speed thing; it seems once it's passes the initial memory POST it's fine.

SOC voltage is same as always, 1.0v

DrMrLordX · Mar 27, 2017

unseenmorbidity said:
AM4 Mainboard VRM Circuitry List

That's . . . interesting. Look at all the doublers in use.

Malogeek · Mar 27, 2017

DrMrLordX said:
That's . . . interesting. Look at all the doublers in use.

Yeah my Asus Prime X370-Pro seems to have a very nice design for a $160 board.

DrMrLordX · Mar 28, 2017

That did jump out at me. 6 phase primary on the X370-Pro vs. doubled 4 on the C6H.

Dresdenboy · Mar 28, 2017

It's time to run AIDA64 again with fixed cache latency measurements:
https://www.aida64.com/news/aida64-v590-amd-ryzen-benchmarks-latency-cache-speed

looncraz · Mar 28, 2017

First!

And... holy [expletive deleted]!!

french toast · Mar 28, 2017

How does than compare to the previous with your ram?

lolfail9001 · Mar 28, 2017

On one hand, now they seeem to align with AMD's clocks mentioned.

OTOH i have seen it vary WILDLY on few screens so far. Or so it seems.

looncraz · Mar 28, 2017

french toast said:
How does than compare to the previous with your ram?

Memory results didn't change at all.

My former results (at 3.8GHz with DDR4-2133 speeds).

I am rerunning 3Ghz and stock with DDR4-2133 and DDR4-2667 speeds.

Despite 2933 now being doable, it's unstable, so I'm not going to mess with it outside of the memory section of the review (which is slowly coming along - I now have time to devote to it fully).

--

I'd also like to say that this seems to support my results that suggest burst/best inter-CCX latency is actually only 20ns, maybe even slightly less. It's just some coordination or microcode issue - the data fabric is insanely good.

french toast · Mar 28, 2017

looncraz said:
Memory results didn't change at all.

My former results (at 3.8GHz with DDR4-2133 speeds).

I am rerunning 3Ghz and stock with DDR4-2133 and DDR4-2667 speeds.

Despite 2933 now being doable, it's unstable, so I'm not going to mess with it outside of the memory section of the review (which is slowly coming along - I now have time to devote to it fully).

--

I'd also like to say that this seems to support my results that suggest burst/best inter-CCX latency is actually only 20ns, maybe even slightly less. It's just some coordination or microcode issue - the data fabric is insanely good.

Wow, check out the the L2 and L3 latency, no problem there.

lolfail9001 · Mar 28, 2017

looncraz said:
I'd also like to say that this seems to support my results that suggest burst/best inter-CCX latency is actually only 20ns, maybe even slightly less. It's just some coordination or microcode issue - the data fabric is insanely good.

Wait, what results, did i miss anything? Anyways, "best" inter-CCX latency may as well be intra-CCX, considering that 2*10.8ns = 21.6ns, in line with your results.

french toast said:
Wow, check out the the L2 and L3 latency, no problem there.

Well, didn't we know that anyway? AMD did mention something like 12 cycle latency for l2 and ~40 (i believe) for l3.

tamz_msc · Mar 28, 2017

@looncraz did you get to test the FMA and AVX2-optimized benchmarks for Ryzen in version 5.90?

looncraz · Mar 28, 2017

lolfail9001 said:
Wait, what results, did i miss anything? Anyways, "best" inter-CCX latency may as well be intra-CCX, considering that 2*10.8ns = 21.6ns, in line with your results.

I haven't released any specific results... until now, I guess:

You can see that the average gap between intra-CCX and inter-CCX latency is about 20ns.

There are 192 bits of data sent between cores for the entire process and I am deploying lock-free atomics for synchronization. Swarm mode (the only currently working mode) has the downside of high volatility in the results - this is a simple averaging of three runs.

EDIT:

From those runs, average inter-CCX latency was 25.5ns, peak was 19.2ns.

Dresdenboy · Mar 28, 2017

tamz_msc said:
@looncraz did you get to test the FMA and AVX2-optimized benchmarks for Ryzen in version 5.90?

They also fixed the FMA3 throughput measurement. Instlatx now reports 0.5 t'put for all of them.

sm625 · Mar 28, 2017

looncraz said:
I haven't released any specific results... until now, I guess:

You can see that the average gap between intra-CCX and inter-CCX latency is about 20ns.

There are 192 bits of data sent between cores for the entire process and I am deploying lock-free atomics for synchronization. Swarm mode (the only currently working mode) has the downside of high volatility in the results - this is a simple averaging of three runs.

EDIT:

From those runs, average inter-CCX latency was 25.5ns, peak was 19.2ns.

Why does the last core of one CCX take so much longer to talk to the 2nd last core of the opposite CCX? Is that repeatable?

imported_jjj · Mar 28, 2017

looncraz said:
I haven't released any specific results... until now, I guess:

You can see that the average gap between intra-CCX and inter-CCX latency is about 20ns.

There are 192 bits of data sent between cores for the entire process and I am deploying lock-free atomics for synchronization. Swarm mode (the only currently working mode) has the downside of high volatility in the results - this is a simple averaging of three runs.

EDIT:

From those runs, average inter-CCX latency was 25.5ns, peak was 19.2ns.

Is that at 3GHz with DRAM at 2133?

OrangeKhrush · Mar 28, 2017

Saw some impressive scores on motherboards using external clock generators. RAM stable up to 3400 and results impressive. I will post scores when im home again doing it on mobile is a pain

Elfear · Mar 28, 2017

DrMrLordX said:
That did jump out at me. 6 phase primary on the X370-Pro vs. doubled 4 on the C6H.

Excuse my ignorance but which solution is better?

IEC · Mar 28, 2017

Latest AGESA release candidate appears to unlock BCLK, even on B350 motherboards. YMMV on limitations of BCLK overclocking though, I think the external clockgens on certain boards are still there to avoid some of the side effects of high BCLK and/or give more granularity/stability.

Should be fine to go up a few MHz on BCLK even without, though.

iBoMbY · Mar 28, 2017

looncraz said:
I haven't released any specific results... until now, I guess:

You can see that the average gap between intra-CCX and inter-CCX latency is about 20ns.

There are 192 bits of data sent between cores for the entire process and I am deploying lock-free atomics for synchronization. Swarm mode (the only currently working mode) has the downside of high volatility in the results - this is a simple averaging of three runs.

EDIT:

From those runs, average inter-CCX latency was 25.5ns, peak was 19.2ns.

So if the DF Latency is about 25ns max., that still doesn't explain the memory latency. Even if we add the L3 Latency + 2x DF Latency + Memory Latency we still would have lower values than the actual readings.

OrangeKhrush · Mar 28, 2017

Can't confirm but

Zen HEDT CPU's are called Threadripper!
Each CPU will include 64 PCI-E Lanes!
It includes 4 CCX's.
Lower SKU(Probably 12/24) 140W TDP, Higher SKU (Probably 16/32) 180W TDP.
Socket will be an SP3 LGA
Platform's name will probably be X399
Chips will be B2 revisions.
32MB L3 Cache
ES's are 3,3 or 3,4 Ghz base and 3,7 Ghz Boost
It is aimed for Retail SKU to have 3,6 Base/4 Ghz Boost
ES's that are in the wild have 2500 CB R15.
Infinity Fabric can have a bandwidth up to 100GB/S
Announcement; COMPUTEX at Taiwan, sales will start after 2-3 weeks following COMPUTEX.

Wow

DisEnchantment · Mar 28, 2017

OrangeKhrush said:
Can't confirm but

Zen HEDT CPU's are called Threadripper!
Each CPU will include 64 PCI-E Lanes!
It includes 4 CCX's.
Lower SKU(Probably 12/24) 140W TDP, Higher SKU (Probably 16/32) 180W TDP.
Socket will be an SP3 LGA
Platform's name will probably be X399
Chips will be B2 revisions.
32MB L3 Cache
ES's are 3,3 or 3,4 Ghz base and 3,7 Ghz Boost
It is aimed for Retail SKU to have 3,6 Base/4 Ghz Boost
ES's that are in the wild have 2500 CB R15.
Infinity Fabric can have a bandwidth up to 100GB/S
Announcement; COMPUTEX at Taiwan, sales will start after 2-3 weeks following COMPUTEX.

Wow

Will someone buy my R7 1700X?

What does your Crystal Ball says re B2 revisions. It seems pretty clairvoyant thus far.

Ryzen: Strictly technical

Golden Member

Member

Golden Member

Member

Lifer

Golden Member

Lifer

Golden Member

Senior member

Senior member

Golden Member

Senior member

Senior member

Golden Member

Diamond Member

Senior member

Golden Member

Diamond Member

Senior member

Senior member

Diamond Member

Elite Member

Member

Senior member

Golden Member