Discussion Zen 5 Speculation (EPYC Turin and Strix Point/Granite Ridge - Ryzen 9000)

Page 58 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
I think base will be 6400 and OCers may get that upto 6800 or 7000. 7200 might be elusive for them. They have a bad track record with IMC having speed parity with Intel's that they couldn't shrug off with Zen 4. I mean, what are the chances that the I/O die will be a complete redesign instead of recycling/tweaking Zen 4's?
Depends. last gen amd had an advantage because the processor was more sensitive to ram speeds. amd operating @ 3600 with fast ddr4 was blitzingly good. this gen neither are winning favours. there's a decent uplift from 5200 to 6000 and then it's very small after. as i posted recently you begin to see very good game performance at very high 7000 and beyond on intel. this is currently unrealistic due to the intel imc giving up in the low 7000's for most people. the zen 4 iod was already a relatively new design. amd's pain point is the speed being influenced by if limitaitons.

neither platform will a large performance jump at the speeds you suggest. Add another thousand at min.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
I would hope so. For an overhaul of the core, it ought to be >15%.

Tbh, I think the bigger question that many aren't discussing is if Zen 5 supports much faster DDR5, ideally 7200 MT/s or higher. Memory latency has been a big weak point of Zen 4 and the AM5 platform, so it would be nice if AMD could ameliorate it in the next generation.
It'll be interesting to see if amd can repeat their past success by delivering a processor running at those clocks, say 6-6.1 ghz while intel will be pushing 6.5 to 6.8 or more in the coming generations, but matching or wiping the floor clean with intel's attempt. ignoring the decade of incompetence amd has always been better than intel at much lower clocks and temps. Except core. Core had both a freq regression that didn't affect its performance and a large noticeable turn back the clock on thermals. It was very impressive work by intel.

when the holiday season approached stores set combo sales or reduced the price very little and the lines to got very long if you wanted to enter to get a chance to get one of them. I remember almost getting into fisticuffs with two separate men because they kept trying to cut people off.

these days you leave your mobile number with a ticket and they call you up if your number gets picked. sleeping inside your car in the parking lot with the heat on is a lot nicer than standing out in the snow or freezing rain. even have a snack in the card, like a butter and preserves sandwich and a banana on the side. with igor it'd be a few bunches of bananas as he read back love sonnets to them.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
It'll be interesting to see if amd can repeat their past success by delivering a processor running at those clocks, say 6-6.1 ghz while intel will be pushing 6.5 to 6.8 or more in the coming generations, but matching or wiping the floor clean with intel's attempt. ignoring the decade of incompetence amd has always been better than intel at much lower clocks and temps. Except core. Core had both a freq regression that didn't affect its performance and a large noticeable turn back the clock on thermals. It was very impressive work by intel.

when the holiday season approached stores set combo sales or reduced the price very little and the lines to got very long if you wanted to enter to get a chance to get one of them. I remember almost getting into fisticuffs with two separate men because they kept trying to cut people off.

these days you leave your mobile number with a ticket and they call you up if your number gets picked. sleeping inside your car in the parking lot with the heat on is a lot nicer than standing out in the snow or freezing rain. even have a snack in the card, like a butter and preserves sandwich and a banana on the side. with igor it'd be a few bunches of bananas as he read back love sonnets to them.
I personally don't care too much about clocks, but rather, overall perf/watt and absolute performance.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
I personally don't care too much about clocks, but rather, overall perf/watt and absolute performance.
Clocks matter if an app likes the frequency better than the amount of data pushed by each clock cycle. there's not a lot of freq loving apps out there and this isn't a big deal for many, but it is for the few.
 

moinmoin

Diamond Member
Jun 1, 2017
4,950
7,659
136
Why was Mike Clark so excited about it
It's not only about the improvements directly achieved but also the new technologies introduced (which can then be refined) and future improvements enabled by the changes (the usual even Zen gen).

Also the excitement may be not only about the Zen cores but also the package layout with CCDs and one IOD that with Zen 4 was still essentially unchanged since Zen 2.
 
  • Like
Reactions: Tlh97 and soresu

soresu

Platinum Member
Dec 19, 2014
2,662
1,862
136
how do you plan on cooling this monstrosity Tim?
Quantum wells, it's the answer to everything! 🤘

 

Ajay

Lifer
Jan 8, 2001
15,451
7,861
136
IPC has plenty to do with clockspeed. IPC will be higher than slower you clock, because DRAM is fewer cycles away. A design that targets a higher clock speed also has to increase cache latency (in terms of clock cycles) at every level; i.e. an L1 able to work at 1 cycle latency at clock x will require 2 cycles latency at clock 2x.

If they increase IPC by 19% it is very unlikely they will be able to maintain the same clock speed. I'm extremely skeptical of any claims that IPC can be increased by that much and clock rates can be increased as well. Sure, they are getting some "free" clock increase due to process, but there's less and less of that available with each process generation.
I'm a bit confused here Doug. IPC is literally, Instructions Per Clock. I used to do calculations on this working on firmware development looking at how long a given instruction took to execute. We had to stick with C/C++ code for portability, but I had no problem tweaking the code to get the compiler to use slightly faster instructions.

What we really have here, and this debate raged on ATF for a while, is Performance Per Clock - which is really an aggregate base on the execution of large instruction stream (from whatever benchmark being used). Ultimately, all I, and I would think most ppl care about is the actual performance delta between a Ryzen 6000 series and an 8000 series APU. +20% is pretty good gen-to-gen nowadays.
 

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,699
136
Zen 5's arch is supposed to be the Zen 1 type of clean sheet performance and efficiency overhaul. Why was Mike Clark so excited about it if it's just 19% improved over Zen 4? Why was he so anxious to want to "buy" it? Something doesn't compute.
Even if Zen 5 is a clean sheet, Zen 4 is pretty good. There's no reason to believe we'll have a Zen 1 moment again if for no other reason than we're going to be comparing to something that isn't 15h.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
That's a huge jump. It's the same jump that occurred from Zen 2 -> Zen 3.
I think people are too locked into the weird geomean amd pushed with the Zen 4 preview last august. the performance leap should be interesting, but the price will be higher than what we've seen. economy should be better then hopefully but who knows.
 

Geddagod

Golden Member
Dec 28, 2021
1,154
1,017
106
Zen 5's arch is supposed to be the Zen 1 type of clean sheet performance and efficiency overhaul. Why was Mike Clark so excited about it if it's just 19% improved over Zen 4? Why was he so anxious to want to "buy" it? Something doesn't compute.
The words used to describe the new architecture for Zen 5 were the exact same words used to describe the new architecture for zen 3, in both cases AMD said "grounds up".
 

DisEnchantment

Golden Member
Mar 3, 2017
1,602
5,788
136
It's not only about the improvements directly achieved but also the new technologies introduced (which can then be refined) and future improvements enabled by the changes (the usual even Zen gen).

Also the excitement may be not only about the Zen cores but also the package layout with CCDs and one IOD that with Zen 4 was still essentially unchanged since Zen 2.
I expect some interesting things from Zen 5 considering AMD is not developing cores on a shoestring budget anymore since a couple of years now.
Zen 3 was developed pretty much during the years of austerity at AMD. Zen 4 slightly less so and Zen 5 should see the first fruits of R&D under better days.
But more interesting for me is indeed packaging and SoC architecture. MI300 is almost here (next week?) to give us a glimpse of next gen packaging.

Curious to see whether InFO-R will replace substrate based PHY for 2.5D packaging on the Zen 5 family. Bergamo seems to have demonstrated the limits of routing with the substrate based interconnects and a likely way forward is fanout based RDLs at a minimum if not active bridges.
Besides the issue with practically no more space for traces coming out from IOD to CCD there is also the problem that the next gen IF which as per employee LinkedIn can hit up to 64Gbps compared to the current 36 Gbps.

I think InFO-3D could be a wildcard to enable lower cost 3D packaging. InFO-3D fits nicely here to enable lesser dense interconnect density than say FE packaging like SoIC but dense enough for SoC level interconnects for stacking on top of IOD. There is big concern at the moment with F15 and F14 underutilized and TSMC is pushing customers from 16FF and older to N7 family and ramping down those fabs (commodity process nodes you might say). Having any customer generously making use of N7/6 besides the leading node would be a win win.

Regarding the core perf gains, they have more transistors to work with and a more efficient process to work with, so at the very least just throwing more transistors at the problem should bring decent gains if their ~6 years (2018-2023) of 'grounds up design' of Zen 5 has to be worthwhile. Zen 4 is behind in capacity in almost all key resources of a typical OoO machine from key contemporaries. Pretty good (but not surprising given other factors) that it even keep up.

Nevertheless, few AMD patents, regarding core architecture, I have been reading strikes me as intriguing and I wonder if they will make it to Zen 5 in some form.
Not coincidentally all these patents are about increasing resources without drastically increasing Transistor usage.
  • Dual fetch/Decode and op-cache pipelines.
    • This seems like something that would be very interesting for mobile to power gate the second pipeline during less demanding loads
    • Remove secondary decode pipeline for a Zen 5c variant? Lets say 2x 4 wide decode for Zen 5 and 4 wide for Zen 5c
  • Retire queue compression
  • op-cache compression
  • Cache compression
  • Master-Shadow PRF
 

Doug S

Platinum Member
Feb 8, 2020
2,260
3,512
136
I'm a bit confused here Doug. IPC is literally, Instructions Per Clock. I used to do calculations on this working on firmware development looking at how long a given instruction took to execute. We had to stick with C/C++ code for portability, but I had no problem tweaking the code to get the compiler to use slightly faster instructions.

What we really have here, and this debate raged on ATF for a while, is Performance Per Clock - which is really an aggregate base on the execution of large instruction stream (from whatever benchmark being used). Ultimately, all I, and I would think most ppl care about is the actual performance delta between a Ryzen 6000 series and an 8000 series APU. +20% is pretty good gen-to-gen nowadays.


Instructions per clock can't be calculated by "looking at how long a given instruction takes to execute", at least not since the days of the 6502 (I remember doing what you are talking about programming an Atari 800's 6502 when I was in junior high) That sort of cycle counting is fine for something like 'MOV R2,0' (assuming you want to deal with figuring out how many instructions can issue and retire in a cycle, which gets more and more complicated the wider CPUs get) but you can't do it for everything.

The reason (as I'm sure you are aware) is that the pipeline will stall on some instructions. For example, when a load cannot be satisfied from cache/DRAM in time. When that happens you are getting 0 instructions for however many cycles that delay lasts. The higher your clock rate, the more often the pipeline will stall - and for more cycles - thus executing fewer instructions per cycle on average.

Now 'performance per clock' sure that's a more useful figure than IPC, though you have to decide what "performance" means. Is it Geekbench 6? Is it SPEC? Is it CBR23? What compiler are you using, with what settings? IPC is talked about often because it is far easier to measure. It may not be free of the issues "PPC" is, but at least most people can agree on the accuracy of the measurement since you use the CPU's performance counters to do it.

Since in this case we are hearing a claim "IPC is increased by 19%" then we can't talk about "PPC", and we have to take into account the effect of clock rate on it (and while IPC is kinda sorta sensitive to what code is being run and the compiler used that's a fairly small effect unless you go out of your way to create a bad test which we will assume AMD is not going to do) If instead we heard a claim that "performance increases by 19%" (i.e. what you will hear Apple talk about when they introduce a new Mac or whatever) then they are talking about "PPC", but generally you aren't going to know what they used to measure it unless they give you a nice graph saying they used SPECint2017 or whatever.

I remember when Apple announced A9 and claimed a 70% performance increase everyone thought that was crazy and they were cherry picking some corner case, then sure enough Geekbench showed a ~70% increase in ST thanks to the combination of IPC improvement along with a massive increase in clock rate. Now maybe they weren't using Geekbench specifically but it was interesting how that lined up so well with their claim in that instance. Obviously Apple was working from a much lower bar back then, everyone is subject to the law of diminishing returns after all.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
I remember when Apple announced A9 and claimed a 70% performance increase everyone thought that was crazy and they were cherry picking some corner case, then sure enough Geekbench showed a ~70% increase in ST thanks to the combination of IPC improvement along with a massive increase in clock rate. Now maybe they weren't using Geekbench specifically but it was interesting how that lined up so well with their claim in that instance. Obviously Apple was working from a much lower bar back then, everyone is subject to the law of diminishing returns after all.
I recall this. I remember saying something like this on a now defunct blog but I was toasted in the comment replies. I had the same outlook when Ryzen launched; people presumed AMD were talking out of their ass when they claimed Ryzen would have a giant leap in performance over dozer. I don't remember the exact figure and whether it was overall performance or not. Sure enough they met that goal. Though that being in recent and thus fresh memory has led some of the wild claims about Zen 5 that have been circling like flies to a pile of horse crap. doesn't help when you have morons like mlid talking out of their behinds. Or the chunky boy with greasy curly hair. It's difficult to say what zen 5 would be like or what arrow lake would be like. I can take my best guess and post it here but my words as valid as the bs spewed by leakers.

6502... I knew you were older than I but I wasn't expecting that or you're not as old and had access to those earlier computers. I remember posting on here a while back how much I disliked computers and technology in the late70s and 80s until I learned to love it. like being fed mushy peas as a child and not liking them. still don't like them. now mushy peas made from frozen sweet peas is delightful, but not the authentic stuff. that is rank.
 
Jul 27, 2020
16,288
10,328
106
Has anyone here postulated that Zen 5 being on N3 and N4 could mean that the single CCD SKUs may use N4 and the dual CCD ones may use N3? It's also possible that the E-core CCD may use N3 for minimal energy usage while the P-core CCD will benefit from the maturity of the N4 node family?
 

Timorous

Golden Member
Oct 27, 2008
1,611
2,764
136
Has anyone here postulated that Zen 5 being on N3 and N4 could mean that the single CCD SKUs may use N4 and the dual CCD ones may use N3? It's also possible that the E-core CCD may use N3 for minimal energy usage while the P-core CCD will benefit from the maturity of the N4 node family?

If there is a node split I expect it is more likely to be APUs Vs CCDs.
 

yuri69

Senior member
Jul 16, 2013
388
619
136
IPC has plenty to do with clockspeed. IPC will be higher than slower you clock, because DRAM is fewer cycles away. A design that targets a higher clock speed also has to increase cache latency (in terms of clock cycles) at every level; i.e. an L1 able to work at 1 cycle latency at clock x will require 2 cycles latency at clock 2x.

If they increase IPC by 19% it is very unlikely they will be able to maintain the same clock speed. I'm extremely skeptical of any claims that IPC can be increased by that much and clock rates can be increased as well. Sure, they are getting some "free" clock increase due to process, but there's less and less of that available with each process generation.
Golden Cove achieved a 19% IPC increase & clocked 0.2GHz higher than Cypress Cove.
Zen 3 achieved a 19% IPC increase & clocked 0.2GHz higher than Zen 2.

What is the catch?
 
  • Like
Reactions: Tlh97 and coercitiv

Geddagod

Golden Member
Dec 28, 2021
1,154
1,017
106
Golden Cove achieved a 19% IPC increase & clocked 0.2GHz higher than Cypress Cove.
Zen 3 achieved a 19% IPC increase & clocked 0.2GHz higher than Zen 2.

What is the catch?
Frequency iso power. You might be able to hit higher peak ST max, but usually larger architectures take more power to reach the same frequencies as the previous architecture, which is way more of an important limiting factor in MT.
For example, CML clocked ~10% higher iso power vs RKL (10400 vs 11400 @65 watts)
IIRC Zen 3 was impressive in the fact that it clocked similarly or maybe even slightly higher than zen 2 iso power.
 
  • Like
Reactions: Tlh97 and moinmoin

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
Instructions per clock can't be calculated by "looking at how long a given instruction takes to execute", at least not since the days of the 6502 (I remember doing what you are talking about programming an Atari 800's 6502 when I was in junior high) That sort of cycle counting is fine for something like 'MOV R2,0' (assuming you want to deal with figuring out how many instructions can issue and retire in a cycle, which gets more and more complicated the wider CPUs get) but you can't do it for everything.

You can always find the ideal CPI for any instruction just by counting the number of cycles it would take to execute. That may or may not be particularly useful, but I'm not aware of any instruction that takes a variable number of cycles to execute given a perfect cache.

Obviously loads and stores can take longer to execute if the miss the cache, but the access times for the different levels of the memory system (L1, L2, RAM, etc.) are also known quantities, though at least with accesses to main memory you could introduce additional variability by saturating the memory controller with requests and forcing it to stall the pipeline for that reason, or even going a step further and intentionally generating page faults so it has to go to disk.

Theoretical IPC is still something you could calculate, but it's not all that meaningful to an end-user. The engineers might want to know what it is though as it would help inform them of where they should spend more of their time or what alleviating some bottleneck would afford in terms of potential performance gains. Software developers may also benefit if they really want to be able to extract maximum performance by hand-tuning program code, though almost no one bothers doing this since compilers tend to do a better job at it and it's incredibly time consuming.
 

Doug S

Platinum Member
Feb 8, 2020
2,260
3,512
136
You can always find the ideal CPI for any instruction just by counting the number of cycles it would take to execute. That may or may not be particularly useful, but I'm not aware of any instruction that takes a variable number of cycles to execute given a perfect cache.

The 6502 took variable numbers of cycles to execute instructions - if you had an instruction like 'AND $10C0,X' which would execute a logical AND on the accumulator using the byte found in the address $10C0 + the X register. If the value of the X register was $40 or higher that instruction took an additional cycle to execute due to the crossing of a 256 byte page boundary. There were some more complicated instructions that could take two additional cycles. Since the cycle timing depended on the value of a register you generally wouldn't know in advance, calculating exact cycle timing was impossible (there were ways around this involving self modifying code if you REALLY needed cycle accurate timing)

Thankfully I left my assembler programming behind with the 6502 so I couldn't say if current CPUs like Intel/AMD's x86 or Apple/ARM AAarch64 have any instructions with variable timing but I wouldn't be surprised - I'd look at instructions doing stuff like multiplication and division first if I was trying to find such.
 
  • Like
Reactions: Mopetar

naukkis

Senior member
Jun 5, 2002
706
578
136
Thankfully I left my assembler programming behind with the 6502 so I couldn't say if current CPUs like Intel/AMD's x86 or Apple/ARM AAarch64 have any instructions with variable timing but I wouldn't be surprised - I'd look at instructions doing stuff like multiplication and division first if I was trying to find such.

Every modern cpu instruction timing is wildly variable. Your 6502 example had memory running zero latency but additional one cycle latency for more than 8 bit addressing. CPU's now have usually 3-level caches for both instruction and data separately, memory is divided to many different timed pages - and usually operate in translated virtual memory meaning that memory access speed in every cache level varies too depending on translation cache hits and misses with page walks -resulting that every instruction execute can vary from one cycle to thousand cycles. And cpu's can reorder instruction to hide that to up to thousand instruction window. So to study how code is executing needs special tools to diagnose it - and for example Intel provides great tools for that.
 
  • Like
Reactions: Tlh97 and moinmoin