Speculation: Ryzen 4000 series/Zen 3

NostaSeronx · Nov 28, 2019

Richie Rich said:
Probably AMD widened FPU pipes from 4 to 8 pipes.

It is already 8 pipes in Zen2, the problem is there is only four AVX256 issue ports. Supporting 4x AVX128 or 4x AVX256.

amd6502 · Nov 28, 2019

uzzi38 said:
Well I don't think I've seen a delidded 300GE, but I'd imagine that is also the Raven2 die.
Also, with Dali coming in < a year clearing as much stock as possible is probably a good idea.

That dual core APU is so long overdue. I think this means they are now finally about to phase out the Stoney low end product. I'm thinking a cut down Raven2 with 1c/2t might be a good fit for eMMC craptop, tablet throwaways and Chromebook laptops. And, they would beat Stoney at 6W tdp is my guess.

The fact that they are also getting a GPU product to replace the 12nm RX 590 with a similarly sized 7nm GPU seems to hint that they may no longer have Glo Fo obligations in the near future (beyond end 2020).

NostaSeronx · Nov 28, 2019

amd6502 said:
That dual core APU is so long overdue. I think this means they are now finally about to phase out the Stoney low end product. I'm thinking a cut down Raven2 with 1c/2t might be a good fit for eMMC craptop, tablet throwaways and Chromebook laptops. And, they would beat Stoney at 6W tdp is my guess.

Raven2 is going to Dali though on 12LP(+).

Stoney still exists, there is still the 12FDX(Small Dali(Stoney successor(A6-9220C/A4-9120C))) product below 12LP(Big Dali(Raven2 successor(Ryzen 3 3200u/Athlon Gold 3150U/Athlon 300U))).

amd6502 said:
that they may no longer have Glo Fo obligations in the near future (beyond end 2020).

They still do it just doesn't involve FinFETs. Especially with the problems at Malta going out of control.

Richie Rich · Nov 29, 2019

soresu said:
IPC is not a singular figure, mere boosts to the memory/cache system of Cortex-A12 led to its quick revision/renaming to A17 and a boost from 3.5 to 4 DMIPS per clock.

It's not impossible that a boost to the L1 could increase IPC all by itself by allowing current resources to be better utilised, though likely not a huge change, probably Zen+ level at most without further changes.

I would agree in case of evolution of Zen 2. However Zen 3 isn't an evolution.

Thunder 57 said:
There will be IPC increases. We just don't know where they are coming from just yet.

What about completely new architecture as Norrod said? AMD engineers run hundreds of simulations so they increased bandwidth due to some specific reason. To avoid some bottlenecks. Zen2 increased bandwidth due to doubling FPUs width. So increased bandwidth might indicate Zen 3 will be wider again, maybe FPU wider as RedGamigTech leak suggested. Maybe wider in ALUs too.

NostaSeronx said:
It is already 8 pipes in Zen2, the problem is there is only four AVX256 issue ports. Supporting 4x AVX128 or 4x AVX256.

View attachment 13756

Zen 2 has 4 pipes FPU (2xFADD, 2xFMUL), take a look here: https://www.anandtech.com/show/14525/amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome/9

soresu · Nov 29, 2019

Richie Rich said:
I would agree in case of evolution of Zen 2. However Zen 3 isn't an evolution.

I think you are taking the "completely new architecture" part a bit too seriously.

When they say new uArch they are just trying to convey the IPC or perf/watt delta to be expected, hence the emphasis on Zen2's more evolutionary changes.

After the clusterfrick of Bulldozer and their increased market (and mind) share in the era of Zen, they don't want to spook customers by straying too far from the winning formula of the last 2 years.

I'd say think K8->K10, rather than Excavator->Zen in terms of change - I think if they meant to make such a dramatic change, they would give it a more interesting/less incremental name than Zen3 (hint hint AMD, Nirvana is perfect!).

Ajay · Nov 29, 2019

Well, AMD is already moving from a 2x4 core CCX, with its split cache and the inter-IF overhead to an 8 core CCD with none of those problems. So that's a new architecture right there. Obviously, there will be other improvements as well.

Olikan · Nov 29, 2019

New AMD patent, very similar to intel's 10nm COAG... probably the main reason N7+ have 20% more density

GATE CONTACT OVER ACTIVE REGION IN CELL

Complete Patent Searching Database and Patent Data Analytics Services.

www.freepatentsonline.com

Richie Rich · Nov 29, 2019

Ajay said:
Well, AMD is already moving from a 2x4 core CCX, with its split cache and the inter-IF overhead to an 8 core CCD with none of those problems. So that's a new architecture right there. Obviously, there will be other improvements as well.

Unified L3 cache is the main reason to call it as completely new architecture??? Maybe in your Intel world. AMD isn't playing this Intel's +3% IPC game. As Keller said in Intel, they made a plan for 50x larger CPUs in next 20 years. I'm pretty sure he did the same in AMD. And they will choose the most effective configuration/technologies for given manufacturing process. And this is what Apple is doing for a long time, AMD is doing that since Keller brought it from Apple. That's why Zen 3 could be quite a different uarch from Zen1/2. Zen 3 could be something like Apple's A11 Hurricane. Small performance jump despite of 6xALUs however new uarch brought solid base for much better performing A12 Vortex (Zen4).

Ajay · Nov 29, 2019

Olikan said:
New AMD patent, very similar to intel's 10nm COAG... probably the main reason N7+ have 20% more density

GATE CONTACT OVER ACTIVE REGION IN CELL

Complete Patent Searching Database and Patent Data Analytics Services.

www.freepatentsonline.com

That's a very generalized patent. You'd have to search TSMC patents, though I suspect many process details to be held as trade secrets.

soresu · Nov 29, 2019

Richie Rich said:
And this is what Apple is doing for a long time

Apple A6 was their first fully custom CPU core, this was only 7 years + 2 months ago.

By comparison AMD have been making x86 CPU's independently since the clone Am386 in 1991 (28 yrs), with their first fully in house K5 in 1996 (23 yrs).

Intel have been at the x86 game since the 8086 original in 1978 (41 yrs), and longer still for its precursors - that is a long time.

Apple may be doing well, but they are still relative newcomers to the custom processor game.

Even then their motivations are different to many others in this space - the greater market motivation is to shift processor/silicon product for whatever purpose.

On the other hand Apple's purpose is exclusively the sale of their own iStuff - their closest market comparison being Samsung before their Mongoose powered Exynos effort folded.

soresu · Nov 29, 2019

Olikan said:
New AMD patent, very similar to intel's 10nm COAG... probably the main reason N7+ have 20% more density

How do you know TSMC license it?

Olikan · Nov 29, 2019

soresu said:
How do you know TSMC license it?

None... it just makes sense, the density boost is similar to intels claims

...And i'm very sure that TSMC will do anything they can, to steal Intel's high power monopoly...

NostaSeronx · Nov 29, 2019

Richie Rich said:
Zen 2 has 4 pipes FPU (2xFADD, 2xFMUL), take a look here: https://www.anandtech.com/show/14525/amd-zen-2-microarchitecture-analysis-ryzen-3000-and-epyc-rome/9

Zen2 has 8 pipes.
4x Lo-128 pipes Lower-128-bit PRF[0:127]
4x Hi-128 pipes Higher-128-bit PRF[128:255]

Take a look here: https://forums.anandtech.com/attachments/zen2-jpg.13756/

amd6502 · Nov 29, 2019

NostaSeronx said:
Raven2 is going to Dali though on 12LP(+).

Stoney still exists, there is still the 12FDX(Small Dali(Stoney successor(A6-9220C/A4-9120C))) product below 12LP(Big Dali(Raven2 successor(Ryzen 3 3200u/Athlon Gold 3150U/Athlon 300U))).They still do it just doesn't involve FinFETs. Especially with the problems at Malta going out of control.

I think with Raven2 existing nowadays, a dozer based Stoney successor isn't really needed. The iGPU port to FDX might also be too much work for this low margin market that is getting very crowded (atom, along with chinese Centaur x86 SoC's as well as acorn SoC).

I'd expect around the same IPC increase for Zen2 to Zen3 as going from Zen+ to Zen2. (~15%, or if quite lucky, a little upwards). With more mature 7nm (aka 7nm+) we might expect 5% to 10% frequency boost along with that for hedt and hedt-tuned desktop (higher wattage parts) but far far greater freq boost for mobile and lower watted desktop parts.

Perf/watt hopefully would increase more than the 20-25% top end desktop performance boost.

Zen2 is a major major change and had a lot of transistors thrown at it. Doubled L3, and a faster L1 that is very integrated with the L2, doubledd FPU, and an extra AGU.

We likely get doubled L2.

The Zen back end was always oversized for some reason with 8 wide retire. The integer core is 7 wide in Zen2. If it grows to 8 wide, would the back end still handle it or would it be widened?

NostaSeronx · Nov 29, 2019

amd6502 said:
I think with Raven2 existing nowadays, a dozer based Stoney successor isn't really needed. The iGPU port to FDX might also be too much work for this low margin market that is getting very crowded (atom, along with chinese Centaur x86 SoC's as well as acorn SoC).

Stoney is approximately 125 mm2 with the 28nm node.
Raven2 is about 150 mm2 with the 14nm node. Then, there is Raven1 being its successor with it potentially being on GloFo's 12LP/12LP+.
Raven2_A0 based => 2C/3CU, Raven1_F0 up to 2C/8CU(same core/compute unit core count as the 2H20 small Zen3 7nm EUV APU)

Stoney -> 125 // Raven2 -> 150 // Raven1 -> >180(?)

Stoney's successor should be around ~63 mm2 being a sub-cm2 APU SoC. With the FDSOI nodes being prime candidates with reduced mask counts, better low Vdd/high Vdd support, etc.
Stoney = 3.7 GHz at 15W <== All AMD needs is same clock at sub-6Ws which is completely achievable with a two node 22 -> 12.
Raven2 = 3.5 GHz at 15(3200U)/35W(3000G)

The word in the street however is 64-bit VFP and single-unit Wave32 CUs

amd6502 said:
The Zen back end was always oversized for some reason with 8 wide retire. The integer core is 7 wide in Zen2. If it grows to 8 wide, would the back end still handle it or would it be widened?

8 macro-ops can support 8 ALUs and 8 FPUs ops. However, imho if they use the shrink given with 5nm it would be simple to slap a second retire queue(RQ0 = A/B threads, RQ1 = C/D threads) on.

jamescox · Nov 29, 2019

NostaSeronx said:
It is already 8 pipes in Zen2, the problem is there is only four AVX256 issue ports. Supporting 4x AVX128 or 4x AVX256.

View attachment 13756

I don’t see where you get 8 pipes from. What is the image supposed to be pointing out? All of the slides I have seen show 2 x FMUL units and 2 x FADD units that are 256-bit AVX in Zen 2. The L1 cache is 2x32 byte load (2x256-bit) and 32 byte write (256-bits) per clock. The cache bandwidth could be a bottleneck, but saying increased by 40% doesn’t make much sense. You would expect that they would double the bandwidth to 4x32 bytes. I suppose they could go up to 3x32 bytes, but that isn’t 40%, it is 50%. Perhaps some mixture of read and write increases.

It is difficult to know what they will do without knowing the current bottlenecks. Does only having 2 MUL and 2 ADD cause a bottleneck? Would it be beneficial to have 4 full FMA units instead? That wouldn’t require as many other changes. Doubling everything to 4 MUL and 4 ADD requires a large upgrade to scheduling and instruction issue/retire. If they want to do AVX512 with two 256-bit micro ops, then doubling makes sense. If they aren’t doing that, then an increase up to 3 MUL + 3 ADD could makes sense for just larger AVX256 throughput.

NostaSeronx · Nov 29, 2019

jamescox said:
What is the image supposed to be pointing out?

4 FMULs, 4 FADDs

One side is 2 FMUL(2x128-bit), 2 FADD(2x128-bit) and 160-entry lower 128-bit PRF
Other side is 2 FMUL(2x128-bit), 2 FADD(2x128-bit) and 160-entry higher 128-bit PRF.

8 128-bit datapaths, with 2 128-bit PRFs. It is also different from previous PRF designs from AMD, as the PRFs are equal(no distinct upper or lower from PRF perspective).

Greyhound with its 64-bit lo/64-bit hi has distinct upper and lower qualities. Same as BD/SR-derived designs. Zen is the first true native 128-bit, but Zen2 is the first design to split hi and low yet not have one PRF larger than the other.

GH/HK/BD/SR => Control is set in low
ZN2 => Control is in both, potentially mirrored if AVX256. If it is mirrored it technically can also be tweaked to AVX512, with 2-entries in both being mirrored(ctl bits) indicating it's an AVX512 resource.

MUL0 ctl:127:0
MUL0-pair(MUL3) (mir)ctl:255:128
etc...

Richie Rich · Nov 30, 2019

jamescox said:
Doubling everything to 4 MUL and 4 ADD requires a large upgrade to scheduling and instruction issue/retire. If they want to do AVX512 with two 256-bit micro ops, then doubling makes sense. If they aren’t doing that, then an increase up to 3 MUL + 3 ADD could makes sense for just larger AVX256 throughput.

That's exactly my point of view either. Making CPU wider is the most complex work IMHO. That's why AMD might decided to develop completely new uarch as a solid and wide base for their future CPU evolutions. Zen 4 and 5 can add more complexity to gain much more performance without being limited by uarch (to pick the lowest hanging fruits, however they need to build solid platform to reach those fruits, because those low hanging fruits are quite high actually).

Completely new uarch doesn't mean something radical. Zen3 could be very conservative design based on Zen1/2 approach, just built much wider. That's why +15% IPC INT gain (from leaks) could be appropriate for first gen 6xALU core (+33% according to Zen1 what seems reasonable). I would also expect 8 FPU pipes rather than 6 due to future AVX512 support (leaked +40% FPU IPC might be appropriate for doubling FPUs due to non-linear scaling).

dnavas · Nov 30, 2019

jamescox said:
The L1 cache is 2x32 byte load (2x256-bit) and 32 byte write (256-bits) per clock. The cache bandwidth could be a bottleneck, but saying increased by 40% doesn’t make much sense. You would expect that they would double the bandwidth to 4x32 bytes. I suppose they could go up to 3x32 bytes, but that isn’t 40%, it is 50%. Perhaps some mixture of read and write increases.

L1 has latency, and "40%" could be representative of some workload. Wasn't the video saying something about specific fp workloads being potentially 50% faster? We could be looking at optimized latencies for specific cases -- for example, support of 4x32 loads and stores at six cycles that are used when possible (maybe a generic optimization for streamed data), 2x256 bit writes, and/or something else. Anyway, my point is that this might not be strictly a width increase, and that the optimizations might be targeted and therefore won't produce a nice round increase.

soresu · Nov 30, 2019

Richie Rich said:
Completely new uarch doesn't mean something radical.

I think we should probably take it like Intel's old "tock" changes, with the Zen1 and Zen3 being in that area, and Zen+ and Zen2 being in the "tick" change area.

In the end it's all evolutionary of Zen, it's just that Zen3 is a more major evolution.

As previously pointed out the unification of the CCD through the L3 caches is a major layout change, however bland it may seem for "completely new" PR, and it would not surprise me to find new instruction(s)/sets in that major change too.

Though I do believe it could be dangerous to AMD to bring in AVX512, unless it can do the same issue per clock as AVX2 currently can - it could lead to programs with both AVX2 and AVX512 codepaths being executed on AVX512 and delivering inferior performance to AVX2?

A/// · Nov 30, 2019

That's quite honestly a decent concern because it's something that comes up in private circles. Nothing I do it tied to AVX512, so I rarely partake in the discussions myself. There's a rumor flying around that to compete with AMD, Intel may bring AVX512 to all mainstream processors sans the i3 from here on out. This makes little sense to me and it seems more like a "Hey my chip's got this one, you should have bought Intel!" or something equally ridiculous in the less tech-centric sphere. Or maybe I over analyze everything to death.

soresu · Nov 30, 2019

A/// said:
This makes little sense to me and it seems more like a "Hey my chip's got this one, you should have bought Intel!"

Intel traded on this for years with SSSE3 and full SSE4.x - I'd say it's a big part of the reason AVX got announced by Intel when AMD announced SSE5, it's about perception of being in front of innovation.

A/// · Nov 30, 2019

Richie Rich said:
Unified L3 cache is the main reason to call it as completely new architecture??? Maybe in your Intel world. AMD isn't playing this Intel's +3% IPC game. As Keller said in Intel, they made a plan for 50x larger CPUs in next 20 years. I'm pretty sure he did the same in AMD. And they will choose the most effective configuration/technologies for given manufacturing process. And this is what Apple is doing for a long time, AMD is doing that since Keller brought it from Apple. That's why Zen 3 could be quite a different uarch from Zen1/2. Zen 3 could be something like Apple's A11 Hurricane. Small performance jump despite of 6xALUs however new uarch brought solid base for much better performing A12 Vortex (Zen4).

50x larger CPU? 50x as large as now or 50 variations? When and where was this said? This is the first time I'm hearing of it.

soresu · Nov 30, 2019

A/// said:
50x larger CPU? 50x as large as now or 50 variations? When and where was this said? This is the first time I'm hearing of it.

I can only assume he means 50x transistors total, rather than per core transistor count.

Because a core with 50x the transistor budget would be insane even at <1nm.

A/// · Nov 30, 2019

soresu said:
I can only assume he means 50x transistors total, rather than per core transistor count.

Because a core with 50x the transistor budget would be insane even at <1nm.

50x transistor rate spread over 20 years?

Speculation: Ryzen 4000 series/Zen 3

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Lifer

Platinum Member

Senior member

Lifer

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member