Speculation: Ryzen 4000 series/Zen 3

Page 19 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
Yes.
I would expect big +40% IPC change for Zen3 as being brand new architecture.
And smaller +10-20% IPC steps steps for Zen4 and Zen5 as being 19h Family improvements (cache latency, front-end optimizations etc.).
Zen5 might bring something new from next 20h Family... same way Zen2 brought Zen3 FPUs.

Timing is:
  • Zen3 2020 (19h Family) ... development started 2013/2014 ... now producing engineering samples..... new 6ALU core + SMT4 + doubling ZEN2 FPU (4x 256-bit FPU is ideal for SMT4)
  • Zen4 2021 (19h Family) ... development started 2018............. now under development .... brings 6nm and some optimalizations as Zen1+
  • Zen5 2022 (19h Family) ....development started 2019............. now started development and implementing Zen6 blocks.... brings 5nm + Zen6 FPU
  • Zen6 2023 (new 20h Family) ... development started 2016/2017 ... uarch concept is freezed, FPU blocks might be almost finished and ready for Zen5 adoption..... will bring 5nm new core 8xALU + SMT8 + doubling FPU units to support SMT8 performance.
So the real question is what is Zen6 because AMD engineers already know the main uarch parameters.
Who cares about Zen3? :D
2023 will definitely bring 3nm Nanosheet/MBCFET device if not earlier, to at least one fab (Samsung that is - TSMC seem to be cagey about MBCFET timeline and specs so far).

I would be surprised if AMD did not make this jump ASAP to retain competitive edge.
 

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
The Cat side has always been aggressive in architecture and nodes.
They had to be aggressive to meet Jaguar shrink demands from Sony and MS for their mid gen console refreshes.

I'd be surprised if it was much more than basic shrinkage, considering the need for maximum compatibility on the mid gen SKU's with pre-existing PS4 and XB1 titles.
 

moinmoin

Diamond Member
Jun 1, 2017
4,926
7,609
136
One of the big advantages of the chiplet architecture is that it would not cost AMD much at all to create the AM5 platform "early" and ship CPUs for both AM5 and AM4 sockets, using the same CPU chiplet but different IO chiplets. This is a substantial market advantage AMD now has compared to Intel, I would be genuinely surprised if they didn't make use of it.

(I also really want it to happen just because it would allow a near-perfect way of comparing DDR4 to DDR5 on almost identical platforms.)
It's definitely an advantage AMD should make use of. But I expect them to push it for semi custom solutions, with customers requesting and paying for different configurations and customized IO dies. For the "generic" market I expect AMD to stick to the cheaper, more marketable one generation, one design, one platform approach. Especially since the turn around time between gens is only about a year anyway (and once it takes longer they can still drop in partly updated gens).

They had to be aggressive to meet Jaguar shrink demands from Sony and MS for their mid gen console refreshes.

I'd be surprised if it was much more than basic shrinkage, considering the need for maximum compatibility on the mid gen SKU's with pre-existing PS4 and XB1 titles.
In any case it was good training for what AMD is now applying across the board.
 
  • Like
Reactions: DarthKyrie
Mar 11, 2004
23,020
5,485
146
One of the big advantages of the chiplet architecture is that it would not cost AMD much at all to create the AM5 platform "early" and ship CPUs for both AM5 and AM4 sockets, using the same CPU chiplet but different IO chiplets. This is a substantial market advantage AMD now has compared to Intel, I would be genuinely surprised if they didn't make use of it.

(I also really want it to happen just because it would allow a near-perfect way of comparing DDR4 to DDR5 on almost identical platforms.)

I don't really see them doing that as there just isn't enough advantage that they'd want to update the platform but not the CPUs. I'd honestly expect the total opposite more even, where they update the CPU package without needing to do a new platform (they'd likely update the I/O die as well, where it'd maybe have more cache or something that would enable them sticking with the same platform to be less of an issue).

I don't really know why that'd be all that interesting, its not like it'd be all that shocking, and very probably it will resemble the last couple memory transitions where the newer one is slower than the similar speed (with lower latency) versions of the previous ones.

I'm guessing that we'll see them merge the AM and TR platforms starting with AM5, one reason is the rumors about there being multiple TR platforms. AM is starting to be an issue (either memory channels, or physical size), but TR is kinda overkill for consumer market (but Ryzen is starting to kinda exceed out of that market). I could see a new LGA socket that isn't as big as current TR, but enables larger physical size of the chip packaging, enabling a move to optional quad channel memory (for higher end boards), more chiplets/larger chiplets (so they can add more CPU cores or GPU, as well as maybe HBM).

What I really want to happen is them to start putting HBM with the I/O die and bypass the system memory as much as possible. It'll have big performance benefits. It should also help make smaller more efficient systems for laptops and other small form factors which are popular with their embedded markets (I could see the "Pro/X" versions of the next gen consoles possibly making that move if they need a big boost in memory bandwidth and GDDR6 not advancing such that they'd need to add more memory bus to get it using GDDR).

It would let them keep their platform for a long while (so AM5 and next TR could have significant longevity) but just make it so that the HBM would be treated like the system memory and then slot NAND into the memory slots, which with DDR5 sockets should let them handily exceed even PCIe 5.0 bandwidth (and eliminates issues like trace lengths that are starting to become an issue with PCIe, while offering up multiple slots for expansion, without all the stuff they have to do to try and offer more SSD slots currently).

I could see Epyc next gen (NG) BGA socket and TR4 to TR5 jump pioneering DDR5 compatibility. For consumer/AM4 this won't make too much sense until DDR5 overtakes DDR4 in affordability. So for NG threatripper socket both graphics-out and DDR5 compatibility are on the wishlist. And as far as graphics this has been a long time coming, because quad channel and monster-iGPU are a natural fit.

And they can be co-produced with 7nm iGPU as you can tag along the optionally enabled IOX functionality onto a big GPU (over 30CU Vega-equivalent).

The issue is Ryzen is growing out consumer space. I think moving forward AM5 replaces the lower end TR socket and probably offers quad channel memory. This lets Ryzen take all the $1000 and below CPU market (and gives it some extra to work with), TR takes the $1k-3k market (single socket), and then EPYC moves to possibly something else (there's quite a few options, move to even bigger sockets, keep the same size socket but rework it to support more sockets say moving to 4 socket support; or perhaps they move to smaller sockets that support higher socket counts, where it lets them slot in different processors, so say they have 8 sockets in the space where 4 TR sockets might would have fit before, where they could offer say 48cores/96 threads per socket for CPU - a reduction of current EPYC but still denser than half sized EPYC chips would be now which would be 32core/64 thread - and because they could go up to 8 sockets it means they could offer up almost 400 cores and over 750 threads in the rack space that they're currently offering up to 128 cores/256 threads; plus that would let them be able to slot in other chips, so maybe a customer needs less CPU, so they have 2 CPUs, 3 GPUs, and then 3 AI processors where companies can put their own chips in versus AMD needing to try to integrate their chips onto AMD's CPU or GPU package). The cheap stuff probably stays on AM4 for awhile and then moves to being embedded.

They'd likely use on package HBM versus doing graphics memory slots, and I don't think they'd even be able to boost bandwidth enough via the slots anyway. HBM3 especially should start offering the density/capacity to replace DDR, which means they could stick with the same number of memory slots/channels as now while massively boosting memory bandwidth (beyond what even GDDR could offer, while also cutting latency versus DDR/GDDR slots), maybe they even reduce some, although I'd hope not, and instead just use the memory slots/channels for SSDs (that would be able to exceed what PCIe could offer with regards to bandwidth/latency).
 
Mar 11, 2004
23,020
5,485
146
They had to be aggressive to meet Jaguar shrink demands from Sony and MS for their mid gen console refreshes.

I'd be surprised if it was much more than basic shrinkage, considering the need for maximum compatibility on the mid gen SKU's with pre-existing PS4 and XB1 titles.

I'm not really sure where you guys are getting that they were being super aggressive with the cat cores. Sure due to them wanting to shrink GPU and get it on 14nm the cat cores made it to 14nm before Zen launched, but that was simply because GPU was already there (being based on Polaris) and the console chips being GPU dominated. And it was mostly just shrinking a CPU design that was already 3 years old and was a smaller simpler core to begin with. Personally for me that's almost the complete opposite of aggressive.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
I'm not really sure where you guys are getting that they were being super aggressive with the cat cores.
The cat cores are super aggressive, there was no overhaul for Bulldozer.

Family 15h the one constant throughout 2010s, no redesign significant enough to push it to a new family number. While the cat cores get Bobcat(Family 14h) to Jaguar(Family 16h) to Zen(Family 17h), and design complete for Zen3(Family 19h).

Family 19h is also split in two; AMD Family 19h Models 00h-0Fh for Server and AMD Family 19h Models 20h-2Fh for Desktop. Which means that the 19h core for desktop is technically newer(/more bug-free) than the 19h core in servers.
 
Last edited:

amd6502

Senior member
Apr 21, 2017
971
360
136
No, Zen and Cat family are distincly very different. The cat cores ended development about in or before 2014, with the Puma+ core. Jaguar+ was ported to 16nm (pdsoi?)** for the TSMC manufactured console APUs (which were approximately GPUs).


**Wiki says 12CU GCN 1.1 and "TSMC 16FF+"
 
Last edited:
  • Like
Reactions: Thunder 57

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
No, Zen and Cat family are distincly very different.
The Cat family includes Bobcat-derivatives(All 14h), Jaguar-derivatives(All 16h), Zen-derivatives(All 17h) and the upcoming Zen3-derativives(All 19h). They utilize the same terminology across the designs(for 14h/16h/17h). There is no distinction going from the Cat-family to the Zen-family, as Zen cores are simply a branch just like Bobcat cores and Jaguar cores.

Cat Family's DNA is built on Portability, Low Power, and Small Area philosophies. While the Zen family builds on architectural DNA with the addition of High Performance/Scalability.

Portability = Flexible: design uses low macro count. <-- trait in Cat and Zen.
Low power: design doesn't waste power. <-- trait in Cat and Zen.
Small area: design doesn't consume a lot of die space. <-- trait in Cat and Zen.
High performance: design doesn't lack performance. <-- revolutionary trait added in Zen family evolution of Cat Family.
Scalability: design can scale from low power to high performance. <-- revolutionary trait added in Zen family evolution of Cat Family.

Starts at Bobcat and continues with Zen3. All Cat-family cores, but Zen-family cores with HiPerf/Scaling started with Zen. With the marketing out of the way. The technical details is all cat cores have FPM+FPA units. Which is distinct from FMA/MMX units in 15h and FMUL+FADD+FMISC in the ancient Family. The pipeline segment names for Bobcat, Jaguar, and Zen are all the same. They all have a retire control unit compared to instruction control unit in ancient Family and instruction scheduler and retirement control unit in 15h.

Marketing, hardware, and software => The Zen family cores are part of the and are Cat family cores. While, Zen appears to be directly descended from Jaguar, it actually only shares a common ancestor within the larger Cat family. (I'm stopping here before I start pulling up shared Bobcat/Jaguar/Zen technical staff.)
 
Last edited:

amd6502

Senior member
Apr 21, 2017
971
360
136
Zen is a from scratch new architecture. It was conceived around the time of Piledriver circa 2012. It took about five+ years for development.

Bobcat and Jag families were low power designs to compliment the dozer high frequency design. During the time of BD and PD, the dozers SoCs weren't capable of ULP of 15W tdp and below (and this was known and anticipated probably at the conception stage of BD). It fixed the power issue as made an efficient low transistor count core, at the expense of frequency. They shared some similarities and components with the dozer family (such as a narrow 2+2 core and the L2). Development of 16h seemed to have stopped once dozers became energy efficient (excavator) and Zen design was close to complete.
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,926
7,609
136
Zen combined parts from Construction (e.g. front end) and Cat (e.g. branch predictor), Zen replaced both of them.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
Zen combined parts from Construction (e.g. front end) and Cat (e.g. branch predictor), Zen replaced both of them.
All of Zen is from Cat family. There were two cancelled low-power cores called Leopard/Margay btw. Any one of them could effectively be Zen 0. No Family 15h or 15h-derivative, it's free real estate.

No traits of the Zen front-end is from the Excavator or earlier designs. No independent prediction queues, no independent instruction byte buffer/queues, etc. The only thing it shares with Bulldozer/Piledriver is the four macro-op decoder. Which lo and behold Zen has four ALUs, of course it is going to have a four macro-op decoder.

Zen doesn't use the macros of Jaguar, which Jaguar also doesn't use the macros of Bobcat. Physically none of the cat cores are the same core. However, certain facets have evolved in junction.

Bobcat has a 12x16B IBB
Jaguar has a 16x16B IBB
Zen has a 20x16B IBB(IBQ)

Each major redesign adds four. Where as none of Bulldozer-Excavator changed the 2x16x16B IBB from 15h 00h to 15h 7Fh.

Following this up, Zen3(Family 19h) shouldn't use any of the macros of Zen or Zen2(Family 17h). Physically, Zen3(Family 19h) will be distinct from Zen1/Zen2(Family 17h). Based on the step previous cat cores took from Family 14h to Family 16h to Family 17h.

Screen%20Shot%202013-05-22%20at%2011.53.27%20PM.png


Family 14h to Family 16h: 12 custom tiles to 5 custom tiles.
Family 17h to Family 19h: ~20 custom tiles to some lower amount of custom tiles because of power efficiency.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
No, Zen and Cat family are distincly very different. The cat cores ended development about in or before 2014, with the Puma+ core. Jaguar+ was ported to 16nm (pdsoi?)** for the TSMC manufactured console APUs (which were approximately GPUs).


**Wiki says 12CU GCN 1.1 and "TSMC 16FF+"
Where did you get PDSOI from?

As far as I'm aware, that is a completely different planar process from the finFET processes at TSMC, Samsung or GF.

I'm not even sure PDSOI is used anymore, I think they switched to Fully Depleted SOI for the fabs that continued with it.
 

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
All of Zen is from Cat family. There were two cancelled low-power cores called Leopard/Margay btw. Any one of them could effectively be Zen 0. No Family 15h or 15h-derivative, it's free real estate.

No traits of the Zen front-end is from the Excavator or earlier designs. No independent prediction queues, no independent instruction byte buffer/queues, etc. The only thing it shares with Bulldozer/Piledriver is the four macro-op decoder. Which lo and behold Zen has four ALUs, of course it is going to have a four macro-op decoder.

Zen doesn't use the macros of Jaguar, which Jaguar also doesn't use the macros of Bobcat. Physically none of the cat cores are the same core. However, certain facets have evolved in junction.

Bobcat has a 12x16B IBB
Jaguar has a 16x16B IBB
Zen has a 20x16B IBB(IBQ)

Each major redesign adds four. Where as none of Bulldozer-Excavator changed the 2x16x16B IBB from 15h 00h to 15h 7Fh.

Following this up, Zen3(Family 19h) shouldn't use any of the macros of Zen or Zen2(Family 17h). Physically, Zen3(Family 19h) will be distinct from Zen1/Zen2(Family 17h). Based on the step previous cat cores took from Family 14h to Family 16h to Family 17h.
What you are observing is a design choice based on what they have seen work on previous designs - they stated unequivocally that Zen was a ground up design.

Not like K7->K8->K10, or even Bulldozer->Piledriver->Steamroller->Excavator.

They may have learned lessons from what worked in cat cores, but it was not based directly on them in the manner of a direct evolution as shown above.
 
  • Like
Reactions: amd6502

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
they stated unequivocally that Zen was a ground up design.
That doesn't change the fact that the Zen family is still the Cat family. All of the remaining Cat-family technical staff worked on Zen. While, only a few of the remaining Bulldozer-family technical staff joined the Zen team, with the rest doing an unknown amount of x86 CPU cores(unlisted-blank-etc). Of those that went from Bulldozer-family to Cat-family were mostly core agnostic to begin with. With STAPM, AVFS, Per-part/Per-core IVR, all appeared in 16h Family before it was inserted into 15h Family.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
No, just no! Zen is a new CPU family, distinct from earlier AMD CPU families. Please stop muddying the waters on this point.
No, the Cat family includes the Zen family. Family 17h processors are in the same global microarchitecture Family as Family 14h and Family 16h. That family is called the Cat family, which design methodology is used on BOBCAT(14h), JAGUAR(16h), ZEN(17h), ZEN2(17h), ZEN3(19h). Stop muddying the waters claiming that Zen doesn't have the work units of Bobcat/Jaguar.

Zen was built to be capable of 1:1 operation against Jaguar. Much like how Jaguar was built bottom-up 1:1 capable operating against Bobcat. This is a shared common microarchitectual trait that all Cat family cores have.
 

Ajay

Lifer
Jan 8, 2001
15,332
7,792
136
Zen was built to be capable of 1:1 operation against Jaguar. Much like how Jaguar was built bottom-up 1:1 capable operating against Bobcat. This is a shared common microarchitectual trait that all Cat family cores have.
Oh for crying out loud. Skylake shares some common architectural elements with the P6 - so what?! You are reaching here.
 

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
Zen was built to be capable of 1:1 operation against Jaguar. Much like how Jaguar was built bottom-up 1:1 capable operating against Bobcat. This is a shared common microarchitectual trait that all Cat family cores have.
You are confusing feature set/extension compatibility with uArch design.

SMT alone is such a significant change that your point is invalid.

A brand new design does not need to be a direct evolution of a previous design to be compatible with software written for it - the latest Cortex A76 shares basically nothing from a uArch perspective with A57 after years of component iteration, yet could run any software that A57 could due to supporting the base v8.0-A feature set.

Likewise Zen supports all the same ISA extensions that Jaguar does, which allows it compatibility with PS4 and XB1 software catalog from the CPU side, just as RDNA supports the GCN Wave64 design used in those hardware feature sets.

This will allow PS5 and XB Scarlett to support any software from their back catalogues.

This doesn't necessarily mean that the new uArch's will execute the old software as efficiently. but their advantages (ie sheer IPC and FLOPS count) will likely overcome any efficiency loss from changing uArch's.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
Oh for crying out loud. Skylake shares some common architectural elements with the P6 - so what?! You are reaching here.
Well yes Skylake is part of the same family. It diverges with nehalem, then sandy bridge, and then itself. However, they are all the same global architecture.
SMT alone is such a significant change that your point is invalid.
SMT operates within the bounds of what is given from Bobcat/Jaguar. The units are modified, but aren't new from Bobcat/Jaguar to Zen. SMT doesn't make it part of a new global family. The Cat family encompasses Bobcat(Lynx), Jaguar(Puma, Catamount, Cheetah, Tiger, Leopard, Margay), Zen(Zen+, Zen2), Zen3. However, the Zen family only includes Zen, Zen2, Zen3. Zen is thus a cat core. Thus Zen's aggressive roadmap is part of Cat's aggressive roadmap.

This is proven via techincal documents, AMD themselves, architect groups, and so many more etc.
Likewise Zen supports all the same ISA extensions that Jaguar does, which allows it compatibility with PS4 and XB1 software catalog from the CPU side..
Zen however encompasses two ALUs, one store AGU, one load AGU, one FPA, and one FPM. Making it intrinsically compatible with legacy code without hitches. All cat architectures will and should have this intrinsic compatibility capability.

Bobcat only supports up to SSE3. Bobcat2 added native FP128 for SSE2.
Jaguar overhauled from-scratch a design that had more effective IPC, faster frequency(10%) and lower power consumption(given AVX128 support) than BT.
Zen overhauled from-scratch a design that had more absolute IPC, much more frequency, and same power given increased IPC than JG. With Zen2 adding native FP256 for AVX256.

All these cores are from the Cat Family. Which began with 14h and hasn't ended yet. They are all developed from the work done of and from Cat micro-processors. From this we can do a first line of prediction/speculations.

Family 19h at minimum will do what Family 16h did from Family 14h.
More frequency, more effective IPC, add a new instruction set(potentially, AVX512 w/ Zen4 adding Native 512-bit) while also having lower power consumption. It could be AVX512, but it could be another try at a XOP; "Microarchitectual and RTL design of FP/SIMD/Vector unit with the latest ARM SVE instruction extensions including microachitectural designs of floating point register caching, per-lane predication, scatter/gather support and wide 512-bit datapath." -> "x86 CPU core microarchitectural design." Maybe, they'll call it XVE: eXtendable Vector Extensions.
 
Last edited:

Ajay

Lifer
Jan 8, 2001
15,332
7,792
136
Ah, I give up. Someone’s got a bee in their bonnet and that’s it.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
Where did you get PDSOI from?

As far as I'm aware, that is a completely different planar process from the finFET processes at TSMC, Samsung or GF.

I'm not even sure PDSOI is used anymore, I think they switched to Fully Depleted SOI for the fabs that continued with it.

Well wiki seems to imply that it's FF. However, then that means they somehow managed to no problems port GCN 1.1 to finfet (along with jaguar), which is surprising.

This is proven via techincal documents, AMD themselves, architect groups, and so many more etc.

None that I'm aware of. Show us your best. Sure, at the time of Zen's beginnings (which coincided with halting of development on cat core) you would have Puma engineers migrating in droves for their new job. That's about as much in common that Zen and cat cores have.

But as far as the functioning of the cores, they couldn't be more different. Zen was a high frequency & high transistor count design; Jaguar and Bobcat were the exact opposite (low transistor count, low freq). monothreading narrow core vs SMT capable 6-wide and growing core (Zen2 being 4+3 and Zen3 likely 5+3, and Zen4 possibly 6+4 or wider).

Jag had a lot more in common with k8/k10 than it has with Zen. k10 and Zen are very distantly related, though I'd say more than Jag to Zen, because while they both distantly originated from k10, they went completely opposite ways.

The front end was common enough between Zen and XV that they were able to use a modified Zen L2 to fix dozers high latency L2 during the XV generation. Zen decode probably derived from dozer decoders.
 

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
Well wiki seems to imply that it's FF. However, then that means they somehow managed to no problems port GCN 1.1 to finfet (along with jaguar), which is surprising.
Both GCN and Jaguar originated on a non SOI 28nm HKMG process after they migrated away from 32nm SOI/PDSOI as far as I am aware.

Both Kaveri and Beema variants used some derivative of a 28nm HKMG process, as well as the first PS4 and XB1 SoC's I think.

They already had to port Bulldozer/Piledriver from 32nm SOI to design Steamroller/Kaveri, so porting cores across significant node changes is probably less difficult than designing from scratch on a brand new process.

The same thing will happen again when they switch to the Nanosheet/GAA/MBCFET device type at 3nm for Zen and RDNA.
 
  • Like
Reactions: Vattila and amd6502

itsmydamnation

Platinum Member
Feb 6, 2011
2,731
3,063
136
The one was to know if something is actually true is to be the exact opposite of what Nosta says it is .

Where is my Tunnelboerer!!!


Just in time confirmed...

- All AMD64 architectures have been scrapped. Scrapped being non-literal, they will launch but not have any successors.
- New ISA -> AMD64/P (Think Y86-64, Power ISA v3.0, and AMD64 fused//Semi-RISC/Semi-CISC ISA w/ high register count of Power, while all compatible with AMD64)
- Redesigned+From Scratch CMT Architecture -> Tunnelborer (IoT to HPC - Ultra Wide AVFS) // SEC28FDS prototyping (two version Tunnelborer-A = Dual-core, Faster TTM; Tunnelborer-B = Quad-core, Slower TTM)
- AMD to be Wholly Owned Subsidiary of IBM by 2022. (Read the 10-Qs from 2003-2010, oops)