Speculation: Ryzen 4000 series/Zen 3

Page 31 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Thunder 57

Platinum Member
Aug 19, 2007
2,675
3,801
136
when the fat lady sings.

It's basically a matter of time (Zen 4 surely should have 4-way imho) and a question whether they do SMT4 or general 4-way MT.

SMT4 isn't dead, also there might be a CPU ACE unit to decouple OS threads from HW threads.

New patents coming up show cores executing more than 4-threads. (Four threads in execution, and many more in a micro-context buffer. In which, the L3 cache contains even more context info)

The SMT4 mode in Milan(256 macro-context unit)/Vermeer(64 macro-context unit) might need a new operating system. However, its SMT2 mode is backwards compatible. It might also have been delayed like NGG, ¯\_(ツ)_/¯.

Not 100% sure, but "full/heavyweight" context switches retain their security but can now occur in nanoseconds rather than in microseconds.

Let me rephrase; SMT4 in Zen 3 is not happening. We may/probably will see it at some point though.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
Well probably lots of wishful thinking on my part since i want my next notebook to be ULP and have a threadripping mode too. You almost surely are correct on that and unfortunately i'ts going to be another year, end 2020 till we know for sure.

Right now I think essentially zero chance on SMT4 and guesstimate of very low chance (~3% or 1:30) only of 4-way multithreading .

Hopefully Nosta finds more interesting patents.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Put a fork in it, SMT4 is dead. It was never a thing. There was never any evidence to suggest it.
I don't care about SMT4 itself, I want wide core with high IPC. Me as a customer I demand good products. And I cannot be happy that Apple mobile CPU A11 from 2017 is much stronger then Zen 3 will be in 2020. That's a shame for x86.

I'm pretty sure AMD wouldn't leak SMT4 feature on that Zen 2 presentation. Yeah, hope's still alive.
On the other side. Canceling ambitious projects might be the reason why Keller left AMD again in 2015. He left AMD in 1999 when they canceled his high performance Hammer CPU. History repeats? I wouldn't be surprised.
 
Last edited:

Kedas

Senior member
Dec 6, 2018
355
339
136
1) TSMCs High Volume Production ready for 7nm+ !!

2) AMDs Design of ZEN 3 is ready.

3) We know it's the same AM4 socket (Ryzen 5000, Zen 4 is DDR5)

So what are we waiting for? ;)
 
  • Like
Reactions: lightmanek

Gideon

Golden Member
Nov 27, 2007
1,638
3,673
136
How are people still parroting this meme
Why is this a meme? I agree that apple cores are different, they sacrifice considerable density to achive what they do, etc ... but they are considerably wider and considerably faster in most general-purpose operations, even when you disregard full stack optimizations, etc.

E.g. if Zen4 ends up being 50% wider, then why is this a "meme"?

EDIT: I get if people don't take specint2006 and other synthetic workloads seriously, but there have been plenty of other cases. One software dev tried some single-threaded self-made workloads (impossible to "optimize" by vendor or full-stack advantages, apple supposedly has) on both Desktop Mac Pro and IPhone and the phone still had the same (or better) ST performance, despite running Ghz's slower. A12/A13 has the best IPC in the business, and this was also said by Anandtech's Andrei.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,631
10,842
136
I don't care about SMT4 itself, I want wide core with high IPC. Me as a customer I demand good products. And I cannot be happy that Apple mobile CPU A11 from 2017 is much stronger then Zen 3 will be in 2020. That's a shame for x86.

A11 isn't really stronger than Zen2. What makes you think it'll be stronger than Zen3? It might be stronger at some arbitrary low-power point but that's functionally meaningless. Anyway, SMT2 will do quite well on a wider core. Instead of 25-30% performance improvement, we might see more gains. 40% is not too much to hope for in applications that can't saturate the pipeline with just one thread per core.

So what are we waiting for? ;)

For Zen2 to go through its product cycle. July 2020 here we come!
 

DisEnchantment

Golden Member
Mar 3, 2017
1,603
5,791
136
New AMD Patent Application
Prefetch data from RAM into L3 to reduce latency. With those big L3s this could mean something.

20190294546 - PREFETCHER BASED SPECULATIVE DYNAMIC RANDOM-ACCESS MEMORY READ REQUEST TECHNIQUE

A method includes monitoring a request rate of speculative memory read requests from a penultimate-level cache to a main memory. The speculative memory read requests correspond to data read requests that missed in the penultimate-level cache. A hit rate of searches of a last-level cache for data requested by the data read requests is monitored. Core demand speculative memory read requests to the main memory are selectively enabled in parallel with searching of the last-level cache for data of a corresponding core demand data read request based on the request rate and the hit rate. Prefetch speculative memory read requests to the main memory are selectively enabled in parallel with searching of the last-level cache for data of a corresponding prefetch data read request based on the request rate and the hit rate. Untitled.png
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
What if Zen 3 has an interposer? How would that change number of links?

It should have no effect. The interconnect between L3 slices is implemented on the CPU chiplet.

An interposer should allow more complex, wider, faster and more efficient interconnect between chiplets, though, since a silicon interposer allows much finer metal layers and much lower energy-per-bit.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
A11 isn't really stronger than Zen2.
A12 has 158% of Skylake IPC in SPECint. A11 is slower but not much because it has 6xALUs too. It is nice example that 6xALU core needs some evolution steps to get max performance (pick the lowest fruits).

we might see more gains. 40% is not too much to hope for
Good point. If Zen 2 SMT2 can gain +20% more performance this means average ALU loading is 80%. With 6xALU core you have base of 150%... so theoretically there might be +70% gain ( to Zen 2). However according to Zen 3 ST it would be around 40%. That's massive gain.
 

NTMBK

Lifer
Nov 14, 2011
10,237
5,020
136
It should have no effect. The interconnect between L3 slices is implemented on the CPU chiplet.

An interposer should allow more complex, wider, faster and more efficient interconnect between chiplets, though, since a silicon interposer allows much finer metal layers and much lower energy-per-bit.

Of course, an active interposer could move some logic off the compute die and into the interposer, opening up all sorts of options for topology.
 

Ajay

Lifer
Jan 8, 2001
15,451
7,861
136
New AMD Patent Application
Prefetch data from RAM into L3 to reduce latency. With those big L3s this could mean something.

20190294546 - PREFETCHER BASED SPECULATIVE DYNAMIC RANDOM-ACCESS MEMORY READ REQUEST TECHNIQUE

A method includes monitoring a request rate of speculative memory read requests from a penultimate-level cache to a main memory. The speculative memory read requests correspond to data read requests that missed in the penultimate-level cache. A hit rate of searches of a last-level cache for data requested by the data read requests is monitored. Core demand speculative memory read requests to the main memory are selectively enabled in parallel with searching of the last-level cache for data of a corresponding core demand data read request based on the request rate and the hit rate. Prefetch speculative memory read requests to the main memory are selectively enabled in parallel with searching of the last-level cache for data of a corresponding prefetch data read request based on the request rate and the hit rate. View attachment 11717

Wait a minute here. How much memory bandwidth does Zen3 have in order to have significant enough read throughput to make speculative reads?!
Heh, and why does that patent show only 4 cores? I don't this this Patent is for Zen3. Seems like AMD is engaged in some patent bracketing or something.
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
A12 has 158% of Skylake IPC in SPECint. A11 is slower but not much because it has 6xALUs too. It is nice example that 6xALU core needs some evolution steps to get max performance (pick the lowest fruits).
Ah, but at what clock freq does power consumption jump through the roof due to uArch mobile optimisations?

And we already know the intrinsic vector length limits of NEON SIMD are below that of AMD, let alone Intel with AVX512.

Of course this will change in the future with SVE2, but that is then, this is now.

There still seems to be a gulf between benchmarking the 2 platforms that respects all possible performance avenues, and vector/SIMD length is a big one in certain use cases.
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
Of course, an active interposer could move some logic off the compute die and into the interposer, opening up all sorts of options for topology.
I thought the whole point of the interposer was interconnect, surely integrating the IO would be the better use case then?
 
  • Like
Reactions: DarthKyrie

scannall

Golden Member
Jan 1, 2012
1,946
1,638
136
Ah, but at what clock freq does power consumption jump through the roof due to uArch mobile optimisations?

And we already know the intrinsic vector length limits of NEON SIMD are below that of AMD, let alone Intel with AVX512.

Of course this will change in the future with SVE2, but that is then, this is now.

There still seems to be a gulf between benchmarking the 2 platforms that respects all possible performance avenues, and vector/SIMD length is a big one in certain use cases.
iOS is OSX, with a touch interface. Should be pretty easy to compare.
 

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
iOS is OSX, with a touch interface. Should be pretty easy to compare.
Should be, and yet we still have these strangely limited benchmarks that miss a crucial area of modern CPU performance in the SIMD execution.

Dunno how we would go about comparing them though - perhaps dav1d would suffice to at least test an AVX2 cpu vs a NEON cpu, but dav1d lacks AVX512 code at present to compare further.
 

extide

Senior member
Nov 18, 2009
261
64
101
www.teraknor.net
What 'stuff'? Zen3 CCD? Zen3 IOD? Please elaborate, TIA.
Frankly at this point they probably have full on engineering samples back in the labs. They taped out a while ago.

FsrRLBh.jpg


ojNsdNw.jpg
 

Gideon

Golden Member
Nov 27, 2007
1,638
3,673
136
Should be, and yet we still have these strangely limited benchmarks that miss a crucial area of modern CPU performance in the SIMD execution.

Dunno how we would go about comparing them though - perhaps dav1d would suffice to at least test an AVX2 cpu vs a NEON cpu, but dav1d lacks AVX512 code at present to compare further.
Geekbench has had AVX512 for ages, why not use it?
 

DrMrLordX

Lifer
Apr 27, 2000
21,631
10,842
136
A12 has 158% of Skylake IPC in SPECint.

Yeah, but . . .

Ah, but at what clock freq does power consumption jump through the roof due to uArch mobile optimisations?

. . . ah, you beat me to it. A11 (and A12) don't clock high enough to be 158% faster than Skylake, Zen2, or Zen3. Only the Apple design team really knows why someone, somewhere hasn't tried making a higher-clocked version of their cores to set the world on fire. They ARE impressive in what they've gained over the years. AMD should be taking notes. But let's be realistic here.

Bringing it back to Zen3 . . . yes, I think AMD could benefit from a wider core. And I'll repeat my point that SMT2 (not 4) will make it pretty easy for everyday users to exploit that wider core. I just don't think AMD needs to lose any sleep over the possibility that Zen3 might be slower than some Apple SoC at such a low clockspeed that nobody's really going to care about that comparison anyway. Zen3 might lock horns with an Axx variant in the notebook sector, eventually. But the software they run will be so different that it'll be hard to make reliable comparisons between the two.