• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion Zen 5 Architecture & Technical discussion

DisEnchantment

Golden Member
Since we have the official unveil done, would be a good time to discuss on the Architecture and technical specifications of the Zen 5 Core.

1717398573815.png


1717398605447.png

1717398629559.png

1717398664624.png

More details will be added once they are public like die size, architecture details etc.

---------------------------------------------------------------------------------------
  • No Market share discussion
  • No Leaks
  • No speculation unrelated to publicly released technical materials/patches/manuals etc.
---------------------------------------------------------------------------------------
 
Last edited:
Zen 5 was supposed to focus on efficiency as per FAD2022. Quite odd there was no mention of it.

Dual decode seems interesting, so Z5 would now be 8 wide (2x 4 wide), Tremont style.
They did not increase the BW to the integer units, I wonder what is the reason behind that, is the int not BW constrained at all?
 
Zen 5 was supposed to focus on efficiency as per FAD2022. Quite odd there was no mention of it.
Yes, I find it a bit concerning.

Dual decode seems interesting, so Z5 would now be 8 wide (2x 4 wide), Tremont style.
They did not increase the BW to the integer units, I wonder what is the reason behind that, is the int not BW constrained at all?
It makes sense to increase the number of integer units if you have enough improvements in branch prediction and data prefetch. If you don't, assuming the micro architecture design is already well balanced, you'll be limited by branch mispredictions and memory latency, so adding integer units would only increase power with little benefit.
 
Ahh well, at least we should have some calm before the next hype storm starts brewing. I also want to give a shout out to @Exist50 who seems to have called quite a lot of things with good accuracy on both sides and did so without being abrasive or seeking to demean other posters. Hopefully we can get back to that type of discussion which seemed to be what we (mostly) had for a while until fairly recently.
We can continue here. Indeed we use to have sane/courteous discussions before but I was warned by the mods when I lamented on the deterioration of the civility of discussions in the thread.

The post mortem will be interesting, is the frontend not as bigly as we thought? PRFi or rob sizes smaller then expected? Not a large enough jump in prefetch / predict? If rob >512 and decode a complete rework then I will say zen5 is bulldozer tier execution, they just are in a much better place before hand and made far better architectural choices.
The front end should be bigly, ROB we dont know yet
But I am wondering why the L1 BW to the core is not increased for integer RF

It makes sense to increase the number of integer units if you have enough improvements in branch prediction and data prefetch. If you don't, assuming the micro architecture design is already well balanced, you'll be limited by branch mispredictions and memory latency, so adding integer units would only increase power with little benefit.
Indeed, but they did increase the decode with, but the BW between int RF and L1 seems unchanged which seems odd, This would be unlike Intel's GLC.

They also have a unified int scheduler, would expect some latency from there too.
 
Indeed, but they did increase the decode with, but the BW between int RF and L1 seems unchanged which seems odd, This would be unlike Intel's GLC.

They also have a unified int scheduler, would expect some latency from there too.
Hmm so if we assume the changes in the micro architecture are deeper, then I have 3 hypotheses:
  1. It's the first (public) iteration, so not everything is well balanced
  2. Increasing the number of ALUs would have increased power too much, so they wait for the next process
  3. AMD thought the improvements were enough for that generation and they keep some improvements for Zen6.
Pure speculation 🙂
 
I think efficiency has improved. They didn’t focus on Zen 5 technical details at all this time.

Hopefully, we get more details soon.
 
Well she did say incredibly energy efficient or something. But not specifics, so perhaps it's not exciting enough for numbers. I don't recall if they've ever really gone into specifics for the desktop launch, I'd expect that for mobile announcement / press releases
 
Well she did say incredibly energy efficient or something. But not specifics, so perhaps it's not exciting enough for numbers. I don't recall if they've ever really gone into specifics for the desktop launch, I'd expect that for mobile announcement / press releases
We can ignore what they say and look at what they do: they're dropping TDP for desktop SKUs except the 16-core. These SKUs must at least match the old gen counterparts in MT performance with a ~40W handicap.

If AMD is really honest about the TDP change, then it is the exact opposite of what happened with Zen 4, where the performance gain was partially fed with extra power headroom.
 
I tried a quick Paint eyeball based on the package render AMD published (IOD size fits to what is known from Zen 4 -> 122.2mm², so it can't be totally off), and the CCD Size was quite close to Zen 4, the increase is in the low single digit mm² range, but certainly less than 5mm² added (I measured 68.4mm²). Maybe thats the consolation price, whatever they did with the microarchitecture & implementation at least didn't blow out the area...
 
I tried a quick Paint eyeball based on the package render AMD published (IOD size fits to what is known from Zen 4 -> 122.2mm², so it can't be totally off), and the CCD Size was quite close to Zen 4, the increase is in the low single digit mm² range, but certainly less than 5mm² added (I measured 68.4mm²). Maybe thats the consolation price, whatever they did with the microarchitecture & implementation at least didn't blow out the area...
There is this one which is an actual picture not a render.

1717424346439.png
 
I tried a quick Paint eyeball based on the package render AMD published (IOD size fits to what is known from Zen 4 -> 122.2mm², so it can't be totally off), and the CCD Size was quite close to Zen 4, the increase is in the low single digit mm² range, but certainly less than 5mm² added (I measured 68.4mm²). Maybe thats the consolation price, whatever they did with the microarchitecture & implementation at least didn't blow out the area...

And the density reduction isn't much between N5 and N4P. You do get solid power savings though.
 
Some, rough at this point, die info:


Core size went up quite a bit, even though overall die size didn't.

According to Fritz's estimate the CCD's for Zen4 and Zen5 are ~ basically the same size (which fits to my rendered package shot eyeball). Looks like improved L3 Layout Density & Topology (Ladder Cache for the Win?) is doing the heavy lifting to compensate for the Core Area increase on the Die Level. If we take the core area increase at face value (big IF!) then the IPC increase (16%) fits okay-ish to Pollack's Law Square Root of the Area increase (√(1.26) -> 1.122 "predicted" IPC increase). So I wouldn't call the Arch a disaster, more like an AMD interpretation of your average "Cove" iteration , but its certainly no Zen3 in terms of bang per gate. In retrospect, Zen3 appears to have been a lightning in a bottle moment unlikely to be repeated any time soon by AMD...
 
In retrospect, Zen3 appears to have been a lightning in a bottle moment unlikely to be repeated any time soon by AMD...
Possibly. I think that the CPU team still had some so called 'low hanging fruit' in Zen3 that helped achieve that IPC jump on a mature 7N process. That is probably over - everything else will be harder. Process node shrinks aren't going to be much of a help on the xtor/mm2 front going forward either. Going forward, node changes will be more towards power and performance improvements, IMHO.
 
According to Fritz's estimate the CCD's for Zen4 and Zen5 are ~ basically the same size (which fits to my rendered package shot eyeball). Looks like improved L3 Layout Density & Topology (Ladder Cache for the Win?) is doing the heavy lifting to compensate for the Core Area increase on the Die Level. If we take the core area increase at face value (big IF!) then the IPC increase (16%) fits okay-ish to Pollack's Law Square Root of the Area increase (√(1.26) -> 1.122 "predicted" IPC increase). So I wouldn't call the Arch a disaster, more like an AMD interpretation of your average "Cove" iteration , but its certainly no Zen3 in terms of bang per gate. In retrospect, Zen3 appears to have been a lightning in a bottle moment unlikely to be repeated any time soon by AMD...
To use an analogy, I think a lot of us were expecting Zen 5 to be like ARM's X cores moment where ST performance was prioritized above all else, but what it ended up being is more like ARM's middle cores which provides a balance of performance, area, and power.
 
With a cache shrink is it possible there is a 96MB vcache chip? If the L3 is smaller, did the TSV pattern change?
 
Back
Top