Discussion Zen 5 Discussion (EPYC Turin and Strix Point/Granite Ridge - Ryzen 8000)

DisEnchantment

Golden Member
Mar 3, 2017
1,587
5,703
136
Well, since many folks already got their hands (or at least going to get) on Zen 4 CPUs , time to discuss about Zen 5 (Zen 4 already old news :D)

We already got roadmaps and key technologies like AIE
1664493390795.png

1664493471118.png

1664493514999.png

1664493556491.png
1681912883215.png
Some things we already knew
  • Dr. Lisa Su and Forrest Norrod already mentioned at FAD 2022 on May 9th, during Q&A that Zen 5 will come in N3 and N4/5 variants so it will be on multiple nodes.
  • Mark Papermaster highlighted that it will be a grounds up architecture, Also mentioned last para here
  • Mike Clark mentioned that they started to work on Zen 5 already in 2018. This means Zen 5 by the time it launches would have been under conception and planning and development for much longer than the original Zen program
For a CPU architecture launching in early 2024 in the form of Strix Point for OEM notebook refresh, tape out should be happening in the next few months already.
Share your thoughts


"I just wanted to close my eyes, go to sleep, and then wake up and buy this thing. I want to be in the future, this thing is awesome and it's going be so great - I can't wait for it." - Mike Clark
 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,731
3,063
136
why nothing about execution ALU , Load/store , dispatch / retire

im guessing 6 wide decode, 6 wide ALU ( two clusters of 3 + zen3/4 like other units sharing register ports ) , 10 issue/dispatch , 10-12 retire , 512+ rob , extra port to L1D ( load or store)

SMT8

edit: plus uop spill to L1I
 
Last edited:
  • Like
Reactions: lightmanek

Saylick

Diamond Member
Sep 10, 2012
3,082
6,171
136
6 wide decode, 12 op/cycle mop cache dispatch, 10 wide execute. Keep the double pumped AVX512, no real need for single cycle AVX512. Larger caches in general (mop, BTB, L1, L2, etc). Clocks will not increase much, only 6 GHz boost. IPC gains well north of 20%, ideally 25%+ via 50% wider overall core (width and IPC never scale proportionally). More accelerators.
 

bakyt115

Member
Nov 21, 2016
82
142
106
I hope we will see implementation of some amds new patens
also there is patent about storing uopcach in L3
 

Insert_Nickname

Diamond Member
May 6, 2012
4,971
1,691
136
I would think going wider would be the logical option, but beyond that I have no idea.

I expect to upgrade to 5 or it's successor eventually. But both my main systems run on 3 so there is no rush. Gives the platform time to mature too.
 

eek2121

Platinum Member
Aug 2, 2005
2,883
3,859
136
I suspect AMD will up the core count to stay competitive. They are doing okay now, but if Intel sticks with 8+16 they may have a harder time in the future.
 
  • Like
Reactions: Kaluan

inf64

Diamond Member
Mar 11, 2011
3,680
3,943
136
Here is what I expect:

1) IPC - AMD had targets in the 40-50% range for radical uarch. departures or redesigns. Examples are Excavator -> Zen 1 (~50%, overshot the target) and Zen 1 (non + version) -> Zen3 ( ~40% cumulative). Therefore, I expect them to target the same 40-50% range going from Zen3 -> Zen 5. My guess is 45% IPC jump target Vs Zen 3 (vanilla) which should mean that Zen 5 could have ~28-30% higher IPC versus Zen 4 (vanilla).

2) Clocks - I expect stagnation or at best slight bump for the max ST target (nothing too radical, ~5%; this should put Zen 5 at best on ~6Ghz range for one threaded workloads).
For MT workloads, I expect no changes as we will have more cores (so 5-5/2Ghz all core boost).

3) Core count - between 50% and 2x more. I still lean more towards 2x increase, so 32C/64T should be the new flagship for mainstream desktop (like 7950x is now).
Stacking of cores and memory (L3) is a norm so I expect they will evolve in the right direction.

4) SMT and AVX512 - I expect no change in SMT (number of threads), so I expect they will use 2 threads per core. AVX512 could become 2x 512 native implementation but I doubt this will happen (so AVX512 implementation remains the same IMO).

5) Accelerators could be a game changer if software support is there. This could accelerate certain workloads by an order of magnitude higher VS traditional core count increases.

6) big. Little - I don't expect "big.Little" approach on mainstream desktop parts ( Ryzen 8000) - AMD will have specific accelerators for targeted workloads. On mobile parts it's a possibility though, something like Zen 5c or Zen 4c on 3nm along with Zen 5 cores (on the same monolithic die, sharing the same ISA and L3 cache)

So yeah, no wonder AMD's engineers are excited about it, on paper it could be a monumental performance and efficiency jump. For ST, up to similar 30% versus flagship Ryzen 7000. For MT, up to 2.5x faster than 16C Zen 4 (if they launch 32C parts). Couple that with possible huge performance jumps in accelerated (Xilinx) workloads, along with 3nm power improvements, Zen 5 looks ready to take on Arrow Lake.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,320
1,447
136
I don't think the FPU will be widened until there is substantial amount of AVX-512 code out there. Certainly not Zen5, probably not Zen6 or 7 either. If Intel reintroduces it soon, then Zen8 or so would be realistic.

Top of my personal wishlist for the FPU is not wider ALUs, but faster gather.
 

JustViewing

Member
Aug 17, 2022
135
232
76
I hope we will see implementation of some amds new patens
Nice find. I think this is what excites Mike Clark. I always wondered about this type of execution. Looks like deep integration of FPGA to CPU. In the patent they are also talking about optimizing the execution based on the integer or floating point workflow. Probably they also could map CPU instructions to trigger execution of GPU code. ultimate HSA!!!.
 

naad

Member
May 31, 2022
63
176
66
Whole package change like ADL, massive increase in int/fp registers and in flight load and stores, 6 wide or more at least, OoO resrouces out of the wazoo, I fully expect them to either double or 1.5x increase (almost) everything zen4 has

Though I think avx512 will remain "double pumped", avx512's advantage is in its shuffles/masks, not its vector throughput, AMD probably knows this best
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,587
5,703
136
Top of my personal wishlist for the FPU is not wider ALUs, but faster gather.
Well, they will have to add more execution ports... because Cinebench

Hmmm ... I dont think that is the route AMD will take with Genoa.

View attachment 51995
Zen4 CCD from the Gigabyte leak likely has two SDP/IF links.
On top of that to support 96 or even 128 cores would mean they need to support up to 512 SerDes links.
Way too much power wasted and looking at the routing for Rome above already is very complicated.
On Rome they had to route the links underneath the CCD.

And in ISSCC 2021, Sam Naffziger already alluded to interposers/higher density interconnects (highlighing by me). This was before Lisa announced 3D V-Cache.
View attachment 51998
In fact from this slide we knew the second item already is coming to Zen3. (Cache while not exactly memory is backed by SRAM which is memory)

From TSMC's offical data, CoWoS-L with LSI/Si bridges is proven and it reaches 3x reticle size which can cover all chiplets for a hypothetical 16 CCD EPYC.
View attachment 51997

Anyway, I think AMD will most likely go with some sort of interposer, probably CoWoS-R if not CoWoS-L if there is really no need for super high density interconnects. i.e. if 4um contact pitch is enough (i.e. CoWoS-R) instead of the high density CoWoS-L (<1um pitch)
If not, they will burn power linking those 96/128 cores, it is not sustainable.
You can read yourself the paper by Naffziger
Seems I was totally wrong about the packaging on Genoa. But at least what I was discussing about has been applied in some form to MI300.

1664711577681.png

1664709753201.png

Granite Ridge would probably be on N4 and Strix Point on N3/E
N4 would offer very minor density increase vs Raphael [92.7 MTr/mm2], probably edging past 100 MTr/mm2 (in line with TSMC's N4P projections).
But those efficiency gains are very very significant -22% Power at Iso Perf if AMD were to go for N4P.
Also N4P would be in lots of supply by 2024, F18P1-4 and F21P1 in AZ. Around 180K wpm.
F18 P5, P6, P7, P8 would be online by end of 2H23. TSMC is yet to finish construction on P8, but P5 and P6 are already completed with equipment installation and P7 is under way.
There will be over supply of wafers due to slowing semi sales.
Granite Ridge on N4 and Strix Point on N3 would be a good strategy to use capacity from both nodes. Across these two nodes there is 380K wpm of wafers.
There are no customers for so much capacity in this current environment, not sure what TSMC will do. This is on top of 200K+ of N6/N7

I think we can safely assume 6 GHz normal operation as guaranteed on Granite Ridge. Currently I am already hitting 5.88 GHz out of the box with my 7950X. When Limited to 85 C, Tjmax my 7950X runs at 5.735 GHz in hysteresis loop.

Right now bigger question is how will AMD add more cores, where is the space. AM5 seems tight to add another chiplet.
I am inclined to think in the direction of @Hans de Vries as mentioned here
1664711432203.png
Zen4 and Zen5 should use the same packages. (Just like Zen...Zen3)
Most likely with the same CCD and IO die arrangements in the packages as well.

I would expect:

Zen4 ---> Zen5

(1) General use of PAM4 for the SERDES so:
- PCIe-5 --> PCI-6 doubles the bandwidth using the same frequency but 2 bit instead of 1 bit per clock edge.
- XGMI3 --> XGMI4 doubles the bandwidth between the IO die and the CCD's using the same number of pins

(2) Doubling the number of cores for each CCD's is enabled by doubling the SERDES bandwidth.
- 16 cores per CCD
- The same number of serdes IO lines
- L3 VCache in increments of 128 MB per die


View attachment 52102

  • Cores on top on cache like could leave ample room for adding more cores. Cores on top of cache, like in MI300 and then multi layer for some of them in V Cache scenarios.
  • Thermals would be a major challenge to be solved for a hypothetical >16 core chip, but the improved power characteristics of N4 could help a bit.
  • Is there even enough BW for more than 16 cores with DDR5 in all core loads? Currently with my 7950X I am getting 84 GB/s BW @6000 MT/s, which is almost 50% more than what I get on my 5950X
  • 84 mm2 CCD would be a decent jump in MTr count if it were not stacked, which would roughly be more than 25% MTr gain (for instance Zen 3 got a 9% MTr gain over Zen 2). For stacked Cores the it is a completely unknown hypothesis
BTW, Mike Clark already alluded to adding more cores to the CCX but as mentioned, L3 latency is going to suffer.
We do see core counts growing, and we will continue to increase the number of cores in our core complex that are shared under an L3. As you point out, communicating through that has both latency problems, and coherency problems, but though that's what architecture is, and that's what we signed up for. It’s what we live for - solving those problems. So I'll just say that the team is already looking at what it takes to grow to a complex far beyond where we are today, and how to deliver that in the future.
This might not be as Bergamo which is supposedly using dual CCX designs to get 16 Cores per chiplet, but pure speculation at this point of course.
 

eek2121

Platinum Member
Aug 2, 2005
2,883
3,859
136
Growing core counts would help keep Intel off their back. If AMD had gone to 32 cores for Zen 4, for example, Raptor Lake would have been DOA for multicore workloads.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
Granite Ridge would probably be on N4 and Strix Point on N3/E
I have my doubts about Strix Point being on any variant of N3.

It should be announced the same year as Granite Ridge, possibly even earlier than Granite Ridge if AMD follows its current APU announcement at CES cadence with Phoenix Point announcement for CES 2023.

All this makes me think it's most likely to land on the same process as Granite RIdge CCD's.

IMHO we are likely to see a 2025 Zen5 / RDNA4 APU which probably will be fabbed on an N3 variant node.
 

CakeMonster

Golden Member
Nov 22, 2012
1,382
474
136
3) Core count - between 50% and 2x more. I still lean more towards 2x increase, so 32C/64T should be the new flagship for mainstream desktop (like 7950x is now).
Stacking of cores and memory (L3) is a norm so I expect they will evolve in the right direction.
That would be cool but I'm not convinced, average consumers might somewhat favor Intel's total core count 'advantage' right now but I predict they will stop caring in 2 years+. I think 50% increase to 12c CCX is more likely as that will keep the dies small and (most importantly) cheap and people frankly won't need more.

Huge disclaimers about server markets here which would overrule everything, if its needed there it might pay off for AMD to go with 16c CCX but it doesn't make sense to me on the consumer parts unless it just trickles down there.

I sure want more cores, but its a luxury with regards to transistors, and a luxury before the games or consumer applications demand it (although it would be fun as heck to have major headroom that creative people could come up with purposes for, which is a rare occasion in the hardware market).
 
  • Like
Reactions: Tlh97

DisEnchantment

Golden Member
Mar 3, 2017
1,587
5,703
136
IMHO we are likely to see a 2025 Zen5 / RDNA4 APU which probably will be fabbed on an N3 variant node.
1664721968286.png

Strix Point is Zen 5 on an "advanced node" in 2024.

Phoenix Point is Zen 4 on N4 in early 2023

1664722306581.png


During FAD 22 Q & A, Forrest was trying to be obscure when he said we can expect Zen CPU cores to be on multiple nodes going forward but then Lisa jumped in and straight up said we can expect Zen 5 on both N3 and N4 in 2024.
May not be Strix Point on N3 but some Zen 5 SoC will be on N3 in 2024.

Found it, timestamped video

"You should expect to see , you know, 4nm and 3nm versions of Zen 5 and you will see them in 2024" - Lisa Su
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
View attachment 68589

Strix Point is Zen 5 on an "advanced node" in 2024.

Phoenix Point is Zen 4 on N4 in early 2023

View attachment 68591


During FAD 22 Q & A, Forrest was trying to be obscure when he said we can expect Zen CPU cores to be on multiple nodes going forward but then Lisa jumped in and straight up said we can expect Zen 5 on both N3 and N4 in 2024.
May not be Strix Point on N3 but some Zen 5 SoC will be on N3 in 2024.

Found it, timestamped video
Oooooof 😨

Counting the rumour that N33 is on TSMC N6 that means RDNA3 is going to be on at least 4 separate node/node variants.

That's approaching ARM level IP fab versatility.

Phoenix should be a really nice boost from Rembrandt at this rate.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
But those efficiency gains are very very significant -22% Power at Iso Perf if AMD were to go for N4P.
N4P is supposed to be -22% power @ iso-perf relative to base N5. Relative to N5P, however, TSMC's numbers give something like +4% perf, iso-power, or -7% power, iso-perf. Honestly, kinda marginal gains.

Does beg the question, however. With Zen 5 widely expected to bring a much "bigger" core, but only small improvements on the process side (pre-N3), then what're the implications for core counts and/or cost? Rumors seem to indicate that Turin is looking to be around 120 cores, give or take. If, for discussion purposes, we assume Zen 5 is scaled +50% from Zen 4, then that's 120/96 * 1.50 / 1.06 (N4 density gains) => 1.77x the silicon area. Pretty big growth, and it's hard to say what TSMC's wafer prices will do between now and then. But I think that unless competition forces them to cut prices, the high end chips will continue to get significantly more expensive.

Also, even if AM5 gives them some room to grow, those scalers above give roughly +40% area per core. I'm not sure they have the room to absorb that and a third compute die, but maybe with an N3 refresh they could?
 
  • Like
Reactions: RnR_au

DisEnchantment

Golden Member
Mar 3, 2017
1,587
5,703
136
Since it's in 2024, wouldn't AMD be using N4X? They did use N5 HPC for Zen4.

I don't suppose it will be, N4X is too leaky for chiplets being shared with server processors.

N4P is supposed to be -22% power @ iso-perf relative to base N5. Relative to N5P, however, TSMC's numbers give something like +4% perf, iso-power, or -7% power, iso-perf. Honestly, kinda marginal gains.
Actually it is relatively speaking, because we will never really know what AMD used as baseline for DTCO for Zen 4 and what they have changed in the PDK.
But if they are moving to N4 based DTCO there must some gains behind that. So I will give them benefit of having a precedent. They were able to optimize and extract more frequency at same power on N7 for Zen 3 vs Zen 2 after all. May not be the the -22% power, but lets see if TSMC (or AMD) is blowing hot air.

we assume Zen 5 is scaled +50% from Zen 4, then that's 120/96 * 1.50 / 1.06 (N4 density gains) => 1.77x the silicon area. Pretty big growth, and it's hard to say what TSMC's wafer prices will do between now and then. But I think that unless competition forces them to cut prices, the high end chips will continue to get significantly more expensive.

Also, even if AM5 gives them some room to grow, those scalers above give roughly +40% area per core. I'm not sure they have the room to absorb that and a third compute die, but maybe with an N3 refresh they could?
Nah... 50% is way too much, AMD is not Apple. At best I expect Zen 5 to gain 20 to 25% more transistor per core (area * scaling) or around 85mm2 CCDs (assuming they still even have CCD concept by then). Zen 3 is barely 9% MTr gain over Zen 2. N5 --> N4 is hardly any density gain. Bean counters will not allow a CCD of around 110mm2. Not sure what they are making with N3, I guess mobile but dont know.
Also big chunks of silicon real estate seem already allocated before hand for second GMI, AVX 512 (unless Zen 5 adds second AVX 512 port)
Most interesting thing for me is if there is any core stacking in place.

What is unknown or at least no rumors thus far is what are those N3 and N4 product segments because Lisa mentioned both for 2024.

Regarding how much N3/5 wafers will cost is up in the air, TSMC has a risk of under utilization (recession and loss of chinese customers).
But I am not terribly interested in cost or market share comparisons.
 

Saylick

Diamond Member
Sep 10, 2012
3,082
6,171
136
N4P is supposed to be -22% power @ iso-perf relative to base N5. Relative to N5P, however, TSMC's numbers give something like +4% perf, iso-power, or -7% power, iso-perf. Honestly, kinda marginal gains.

Does beg the question, however. With Zen 5 widely expected to bring a much "bigger" core, but only small improvements on the process side (pre-N3), then what're the implications for core counts and/or cost? Rumors seem to indicate that Turin is looking to be around 120 cores, give or take. If, for discussion purposes, we assume Zen 5 is scaled +50% from Zen 4, then that's 120/96 * 1.50 / 1.06 (N4 density gains) => 1.77x the silicon area. Pretty big growth, and it's hard to say what TSMC's wafer prices will do between now and then. But I think that unless competition forces them to cut prices, the high end chips will continue to get significantly more expensive.

Also, even if AM5 gives them some room to grow, those scalers above give roughly +40% area per core. I'm not sure they have the room to absorb that and a third compute die, but maybe with an N3 refresh they could?
Perhaps it's only DT and mobile Granite Ridge that we'll see N4P. Strix Point, APUs, and Turin will be on N3E. Basically, only the markets that demand the best perf/W and best density get the best node. DT is not one of those markets, especially when it's a market where people are more perf/$ sensitive. N4P would give AMD the ability to keep costs in check while raising perf/$.