Speculation: Ryzen 4000 series/Zen 3

Page 55 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

amd6502

Senior member
Apr 21, 2017
971
360
136
Well I don't think I've seen a delidded 300GE, but I'd imagine that is also the Raven2 die.
Also, with Dali coming in < a year clearing as much stock as possible is probably a good idea.

That dual core APU is so long overdue. I think this means they are now finally about to phase out the Stoney low end product. I'm thinking a cut down Raven2 with 1c/2t might be a good fit for eMMC craptop, tablet throwaways and Chromebook laptops. And, they would beat Stoney at 6W tdp is my guess.

The fact that they are also getting a GPU product to replace the 12nm RX 590 with a similarly sized 7nm GPU seems to hint that they may no longer have Glo Fo obligations in the near future (beyond end 2020).
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
That dual core APU is so long overdue. I think this means they are now finally about to phase out the Stoney low end product. I'm thinking a cut down Raven2 with 1c/2t might be a good fit for eMMC craptop, tablet throwaways and Chromebook laptops. And, they would beat Stoney at 6W tdp is my guess.
Raven2 is going to Dali though on 12LP(+).

Stoney still exists, there is still the 12FDX(Small Dali(Stoney successor(A6-9220C/A4-9120C))) product below 12LP(Big Dali(Raven2 successor(Ryzen 3 3200u/Athlon Gold 3150U/Athlon 300U))).
that they may no longer have Glo Fo obligations in the near future (beyond end 2020).
They still do it just doesn't involve FinFETs. Especially with the problems at Malta going out of control.
 
Last edited:
  • Like
Reactions: amd6502

Richie Rich

Senior member
Jul 28, 2019
470
230
76
IPC is not a singular figure, mere boosts to the memory/cache system of Cortex-A12 led to its quick revision/renaming to A17 and a boost from 3.5 to 4 DMIPS per clock.

It's not impossible that a boost to the L1 could increase IPC all by itself by allowing current resources to be better utilised, though likely not a huge change, probably Zen+ level at most without further changes.
I would agree in case of evolution of Zen 2. However Zen 3 isn't an evolution.


There will be IPC increases. We just don't know where they are coming from just yet.
What about completely new architecture as Norrod said? AMD engineers run hundreds of simulations so they increased bandwidth due to some specific reason. To avoid some bottlenecks. Zen2 increased bandwidth due to doubling FPUs width. So increased bandwidth might indicate Zen 3 will be wider again, maybe FPU wider as RedGamigTech leak suggested. Maybe wider in ALUs too.

It is already 8 pipes in Zen2, the problem is there is only four AVX256 issue ports. Supporting 4x AVX128 or 4x AVX256.

View attachment 13756
Zen 2 has 4 pipes FPU (2xFADD, 2xFMUL), take a look here: https://www.anandtech.com/show/14525/amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome/9
 

soresu

Diamond Member
Dec 19, 2014
4,117
3,571
136
I would agree in case of evolution of Zen 2. However Zen 3 isn't an evolution.
I think you are taking the "completely new architecture" part a bit too seriously.

When they say new uArch they are just trying to convey the IPC or perf/watt delta to be expected, hence the emphasis on Zen2's more evolutionary changes.

After the clusterfrick of Bulldozer and their increased market (and mind) share in the era of Zen, they don't want to spook customers by straying too far from the winning formula of the last 2 years.

I'd say think K8->K10, rather than Excavator->Zen in terms of change - I think if they meant to make such a dramatic change, they would give it a more interesting/less incremental name than Zen3 (hint hint AMD, Nirvana is perfect!).
 
  • Like
Reactions: Thunder 57

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
Well, AMD is already moving from a 2x4 core CCX, with its split cache and the inter-IF overhead to an 8 core CCD with none of those problems. So that's a new architecture right there. Obviously, there will be other improvements as well.
 

Richie Rich

Senior member
Jul 28, 2019
470
230
76
Well, AMD is already moving from a 2x4 core CCX, with its split cache and the inter-IF overhead to an 8 core CCD with none of those problems. So that's a new architecture right there. Obviously, there will be other improvements as well.
Unified L3 cache is the main reason to call it as completely new architecture??? Maybe in your Intel world. AMD isn't playing this Intel's +3% IPC game. As Keller said in Intel, they made a plan for 50x larger CPUs in next 20 years. I'm pretty sure he did the same in AMD. And they will choose the most effective configuration/technologies for given manufacturing process. And this is what Apple is doing for a long time, AMD is doing that since Keller brought it from Apple. That's why Zen 3 could be quite a different uarch from Zen1/2. Zen 3 could be something like Apple's A11 Hurricane. Small performance jump despite of 6xALUs however new uarch brought solid base for much better performing A12 Vortex (Zen4).
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
New AMD patent, very similar to intel's 10nm COAG... probably the main reason N7+ have 20% more density

That's a very generalized patent. You'd have to search TSMC patents, though I suspect many process details to be held as trade secrets.
 
  • Like
Reactions: Olikan and soresu

soresu

Diamond Member
Dec 19, 2014
4,117
3,571
136
And this is what Apple is doing for a long time
Apple A6 was their first fully custom CPU core, this was only 7 years + 2 months ago.

By comparison AMD have been making x86 CPU's independently since the clone Am386 in 1991 (28 yrs), with their first fully in house K5 in 1996 (23 yrs).

Intel have been at the x86 game since the 8086 original in 1978 (41 yrs), and longer still for its precursors - that is a long time.

Apple may be doing well, but they are still relative newcomers to the custom processor game.

Even then their motivations are different to many others in this space - the greater market motivation is to shift processor/silicon product for whatever purpose.

On the other hand Apple's purpose is exclusively the sale of their own iStuff - their closest market comparison being Samsung before their Mongoose powered Exynos effort folded.
 
Last edited:
  • Like
Reactions: Tlh97

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
How do you know TSMC license it?
None... it just makes sense, the density boost is similar to intels claims

...And i'm very sure that TSMC will do anything they can, to steal Intel's high power monopoly...
 

amd6502

Senior member
Apr 21, 2017
971
360
136
Raven2 is going to Dali though on 12LP(+).

Stoney still exists, there is still the 12FDX(Small Dali(Stoney successor(A6-9220C/A4-9120C))) product below 12LP(Big Dali(Raven2 successor(Ryzen 3 3200u/Athlon Gold 3150U/Athlon 300U))).They still do it just doesn't involve FinFETs. Especially with the problems at Malta going out of control.

I think with Raven2 existing nowadays, a dozer based Stoney successor isn't really needed. The iGPU port to FDX might also be too much work for this low margin market that is getting very crowded (atom, along with chinese Centaur x86 SoC's as well as acorn SoC).

I'd expect around the same IPC increase for Zen2 to Zen3 as going from Zen+ to Zen2. (~15%, or if quite lucky, a little upwards). With more mature 7nm (aka 7nm+) we might expect 5% to 10% frequency boost along with that for hedt and hedt-tuned desktop (higher wattage parts) but far far greater freq boost for mobile and lower watted desktop parts.

Perf/watt hopefully would increase more than the 20-25% top end desktop performance boost.

Zen2 is a major major change and had a lot of transistors thrown at it. Doubled L3, and a faster L1 that is very integrated with the L2, doubledd FPU, and an extra AGU.

We likely get doubled L2.

The Zen back end was always oversized for some reason with 8 wide retire. The integer core is 7 wide in Zen2. If it grows to 8 wide, would the back end still handle it or would it be widened?
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
I think with Raven2 existing nowadays, a dozer based Stoney successor isn't really needed. The iGPU port to FDX might also be too much work for this low margin market that is getting very crowded (atom, along with chinese Centaur x86 SoC's as well as acorn SoC).
Stoney is approximately 125 mm2 with the 28nm node.
Raven2 is about 150 mm2 with the 14nm node. Then, there is Raven1 being its successor with it potentially being on GloFo's 12LP/12LP+.
Raven2_A0 based => 2C/3CU, Raven1_F0 up to 2C/8CU(same core/compute unit core count as the 2H20 small Zen3 7nm EUV APU)

Stoney -> 125 // Raven2 -> 150 // Raven1 -> >180(?)

Stoney's successor should be around ~63 mm2 being a sub-cm2 APU SoC. With the FDSOI nodes being prime candidates with reduced mask counts, better low Vdd/high Vdd support, etc.
Stoney = 3.7 GHz at 15W <== All AMD needs is same clock at sub-6Ws which is completely achievable with a two node 22 -> 12.
Raven2 = 3.5 GHz at 15(3200U)/35W(3000G)

The word in the street however is 64-bit VFP and single-unit Wave32 CUs
The Zen back end was always oversized for some reason with 8 wide retire. The integer core is 7 wide in Zen2. If it grows to 8 wide, would the back end still handle it or would it be widened?
8 macro-ops can support 8 ALUs and 8 FPUs ops. However, imho if they use the shrink given with 5nm it would be simple to slap a second retire queue(RQ0 = A/B threads, RQ1 = C/D threads) on.
 
Last edited:

jamescox

Senior member
Nov 11, 2009
644
1,105
136
It is already 8 pipes in Zen2, the problem is there is only four AVX256 issue ports. Supporting 4x AVX128 or 4x AVX256.

View attachment 13756

I don’t see where you get 8 pipes from. What is the image supposed to be pointing out? All of the slides I have seen show 2 x FMUL units and 2 x FADD units that are 256-bit AVX in Zen 2. The L1 cache is 2x32 byte load (2x256-bit) and 32 byte write (256-bits) per clock. The cache bandwidth could be a bottleneck, but saying increased by 40% doesn’t make much sense. You would expect that they would double the bandwidth to 4x32 bytes. I suppose they could go up to 3x32 bytes, but that isn’t 40%, it is 50%. Perhaps some mixture of read and write increases.

It is difficult to know what they will do without knowing the current bottlenecks. Does only having 2 MUL and 2 ADD cause a bottleneck? Would it be beneficial to have 4 full FMA units instead? That wouldn’t require as many other changes. Doubling everything to 4 MUL and 4 ADD requires a large upgrade to scheduling and instruction issue/retire. If they want to do AVX512 with two 256-bit micro ops, then doubling makes sense. If they aren’t doing that, then an increase up to 3 MUL + 3 ADD could makes sense for just larger AVX256 throughput.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,811
1,290
136
What is the image supposed to be pointing out?
4 FMULs, 4 FADDs

One side is 2 FMUL(2x128-bit), 2 FADD(2x128-bit) and 160-entry lower 128-bit PRF
Other side is 2 FMUL(2x128-bit), 2 FADD(2x128-bit) and 160-entry higher 128-bit PRF.

8 128-bit datapaths, with 2 128-bit PRFs. It is also different from previous PRF designs from AMD, as the PRFs are equal(no distinct upper or lower from PRF perspective).

Greyhound with its 64-bit lo/64-bit hi has distinct upper and lower qualities. Same as BD/SR-derived designs. Zen is the first true native 128-bit, but Zen2 is the first design to split hi and low yet not have one PRF larger than the other.

GH/HK/BD/SR => Control is set in low
ZN2 => Control is in both, potentially mirrored if AVX256. If it is mirrored it technically can also be tweaked to AVX512, with 2-entries in both being mirrored(ctl bits) indicating it's an AVX512 resource.

MUL0 ctl:127:0
MUL0-pair(MUL3) (mir)ctl:255:128
etc...
 
Last edited:

Richie Rich

Senior member
Jul 28, 2019
470
230
76
Doubling everything to 4 MUL and 4 ADD requires a large upgrade to scheduling and instruction issue/retire. If they want to do AVX512 with two 256-bit micro ops, then doubling makes sense. If they aren’t doing that, then an increase up to 3 MUL + 3 ADD could makes sense for just larger AVX256 throughput.
That's exactly my point of view either. Making CPU wider is the most complex work IMHO. That's why AMD might decided to develop completely new uarch as a solid and wide base for their future CPU evolutions. Zen 4 and 5 can add more complexity to gain much more performance without being limited by uarch (to pick the lowest hanging fruits, however they need to build solid platform to reach those fruits, because those low hanging fruits are quite high actually).

Completely new uarch doesn't mean something radical. Zen3 could be very conservative design based on Zen1/2 approach, just built much wider. That's why +15% IPC INT gain (from leaks) could be appropriate for first gen 6xALU core (+33% according to Zen1 what seems reasonable). I would also expect 8 FPU pipes rather than 6 due to future AVX512 support (leaked +40% FPU IPC might be appropriate for doubling FPUs due to non-linear scaling).
 

dnavas

Senior member
Feb 25, 2017
355
190
116
The L1 cache is 2x32 byte load (2x256-bit) and 32 byte write (256-bits) per clock. The cache bandwidth could be a bottleneck, but saying increased by 40% doesn’t make much sense. You would expect that they would double the bandwidth to 4x32 bytes. I suppose they could go up to 3x32 bytes, but that isn’t 40%, it is 50%. Perhaps some mixture of read and write increases.

L1 has latency, and "40%" could be representative of some workload. Wasn't the video saying something about specific fp workloads being potentially 50% faster? We could be looking at optimized latencies for specific cases -- for example, support of 4x32 loads and stores at six cycles that are used when possible (maybe a generic optimization for streamed data), 2x256 bit writes, and/or something else. Anyway, my point is that this might not be strictly a width increase, and that the optimizations might be targeted and therefore won't produce a nice round increase.
 

soresu

Diamond Member
Dec 19, 2014
4,117
3,571
136
Completely new uarch doesn't mean something radical.
I think we should probably take it like Intel's old "tock" changes, with the Zen1 and Zen3 being in that area, and Zen+ and Zen2 being in the "tick" change area.

In the end it's all evolutionary of Zen, it's just that Zen3 is a more major evolution.

As previously pointed out the unification of the CCD through the L3 caches is a major layout change, however bland it may seem for "completely new" PR, and it would not surprise me to find new instruction(s)/sets in that major change too.

Though I do believe it could be dangerous to AMD to bring in AVX512, unless it can do the same issue per clock as AVX2 currently can - it could lead to programs with both AVX2 and AVX512 codepaths being executed on AVX512 and delivering inferior performance to AVX2?
 
  • Love
Reactions: A///

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
That's quite honestly a decent concern because it's something that comes up in private circles. Nothing I do it tied to AVX512, so I rarely partake in the discussions myself. There's a rumor flying around that to compete with AMD, Intel may bring AVX512 to all mainstream processors sans the i3 from here on out. This makes little sense to me and it seems more like a "Hey my chip's got this one, you should have bought Intel!" or something equally ridiculous in the less tech-centric sphere. Or maybe I over analyze everything to death.
 

soresu

Diamond Member
Dec 19, 2014
4,117
3,571
136
This makes little sense to me and it seems more like a "Hey my chip's got this one, you should have bought Intel!"
Intel traded on this for years with SSSE3 and full SSE4.x - I'd say it's a big part of the reason AVX got announced by Intel when AMD announced SSE5, it's about perception of being in front of innovation.
 

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
Unified L3 cache is the main reason to call it as completely new architecture??? Maybe in your Intel world. AMD isn't playing this Intel's +3% IPC game. As Keller said in Intel, they made a plan for 50x larger CPUs in next 20 years. I'm pretty sure he did the same in AMD. And they will choose the most effective configuration/technologies for given manufacturing process. And this is what Apple is doing for a long time, AMD is doing that since Keller brought it from Apple. That's why Zen 3 could be quite a different uarch from Zen1/2. Zen 3 could be something like Apple's A11 Hurricane. Small performance jump despite of 6xALUs however new uarch brought solid base for much better performing A12 Vortex (Zen4).

50x larger CPU? 50x as large as now or 50 variations? When and where was this said? This is the first time I'm hearing of it.
 

soresu

Diamond Member
Dec 19, 2014
4,117
3,571
136
50x larger CPU? 50x as large as now or 50 variations? When and where was this said? This is the first time I'm hearing of it.
I can only assume he means 50x transistors total, rather than per core transistor count.

Because a core with 50x the transistor budget would be insane even at <1nm.
 
  • Like
Reactions: A///

A///

Diamond Member
Feb 24, 2017
4,351
3,160
136
I can only assume he means 50x transistors total, rather than per core transistor count.

Because a core with 50x the transistor budget would be insane even at <1nm.
50x transistor rate spread over 20 years?