Discussion Zen 4 Core Specifications Discussion

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
1661935926947.png
1661935956023.png



1664491299461.png

Some tidbits
  • A 15 layer Telescoping Metal stack has been co-optimized to deliver both high frequency and high density routing capability
This bodes well for density going forward, since they managed to increase frequency greatly without adding additional metal layers. Probably RDNA3 will hit in the same range for density ~90MTr/mm2 and probably blazing frequency if thermal hotspots can be taken care of.
They did add a lot more transistor to support AVX512/increasing ROB/L2/uop cache/BTB.

I bet the second GMI burnt a lot of space albeit probably a necessary forward looking block.

Zen5 will be a reset and optimize the core again a la Zen 3

Will be updated if more specs will show up. This time I doubt AMD will be more open
 
Last edited:

naad

Member
May 31, 2022
63
176
66
So the second part of the CnC deep dive is now online:

They take a look at the question we already discussed about the IFoP bandwidth (or the lack of it) constraining DRAM bandwidth:

Firstly, the memory controller does not seem to be able to take full advantage of 96 Gbyte/s DDR5-6000. Even with a 2 CCD SKU they only get around 73 Gbyte/s read speed.
Secondly, with write they run into the IFoP limit of 2x32Gbyte/s.
So a single CCD SKU should effectively be limited to 64Gbyte/s read and 32Gbyte/s write and AIDA is basically rubbish :oops:

EFB on everything seemingly soon I guess, SERDES over organic interposer can only go so far
 

moinmoin

Diamond Member
Jun 1, 2017
4,934
7,619
136
They take a look at the question we already discussed about the IFoP bandwidth (or the lack of it) constraining DRAM bandwidth:

Firstly, the memory controller does not seem to be able to take full advantage of 96 Gbyte/s DDR5-6000. Even with a 2 CCD SKU they only get around 73 Gbyte/s read speed.
Secondly, with write they run into the IFoP limit of 2x32Gbyte/s.
So a single CCD SKU should effectively be limited to 64Gbyte/s read and 32Gbyte/s write and AIDA is basically rubbish :oops:
I guess the biggest news here is the numbers by AIDA being rubbish to such a high degree. :p

What's interesting is how differently AMD and Apple approach this whole issue: Apple essentially allows a single core to saturate all available bandwidth, whereas AMD essentially segments BW in all possible directions (half rate write BW, a single CCD all-core loads can't saturate the memory etc.) where one would expect that to be serious performance bottlenecks.

Another way to look at this: While additional L3$ won't increase BW to and from the CCD, it should be able to significantly reduce latency in short BW starved corner cases. So 7000X3D should be able to add significant performance on top, until Zen 5 likely restructures all the links.
 

thigobr

Senior member
Sep 4, 2016
231
165
116
I was anxiously waiting for the second part because of this! I am skipping AM5 and Zen 4 so I don't have the CPU to play and explore but I suspected this all the way...
It certainly limit things for some workloads but that's where 3D cache will shine
 
  • Like
Reactions: Tlh97 and Kaluan

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
I am not so sure about your Zen4c remark. From the slides it looks as if the IOD has only 12 links and not the needed 16 for 8 CCDs.
After few beers my creative mind gets active and started connecting the dots to where I won't normally see them.
But I believe the dual GMI exists for dual CCX purposes as well (and not only for wide mode) in true AMD resourceful thought process.
Need die shots of sIOD
 

BorisTheBlade82

Senior member
May 1, 2020
660
1,003
106
After few beers my creative mind gets active and started connecting the dots to where I won't normally see them.
But I believe the dual GMI exists for dual CCX purposes as well (and not only for wide mode) in true AMD resourceful thought process.
Need die shots of sIOD
I am fully aware of what you mean 😉
And although I (sadly) haven't had a beer till now today, I am in the mood of connecting some dots that might or might not be there by myself:
  • As it looks like there are not enough links on the sIOD to connect each Bergamo CCX with its own link.
  • This leads to heavy bandwidth starvation as each core would only get 4/2 GByte/s read/write as opposed to at least 8/4 GByte/s for Genoa (small core count even getting 16/8).
  • That gets even worse as Bergamo has only half the L3 to hide the lack of RAM bandwidth.
Now to the wild guess:
We had the speculation that Zen5 might introduce Big.little by using Zen4c. What if the latter does not use the same sIOD but rather advanced packaging with some other IOD? Maybe for the desired market the demands of IO are different to what Genoa and Sienna provide as well?
So they could introduce a new interconnect with Bergamo which is ready to use as the small core CCD when Zen5 hits the road.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
I am fully aware of what you mean 😉
And although I (sadly) haven't had a beer till now today, I am in the mood of connecting some dots that might or might not be there by myself:
  • As it looks like there are not enough links on the sIOD to connect each Bergamo CCX with its own link.
  • This leads to heavy bandwidth starvation as each core would only get 4/2 GByte/s read/write as opposed to at least 8/4 GByte/s for Genoa (small core count even getting 16/8).
  • That gets even worse as Bergamo has only half the L3 to hide the lack of RAM bandwidth.
Now to the wild guess:
We had the speculation that Zen5 might introduce Big.little by using Zen4c. What if the latter does not use the same sIOD but rather advanced packaging with some other IOD? Maybe for the desired market the demands of IO are different to what Genoa and Sienna provide as well?
So they could introduce a new interconnect with Bergamo which is ready to use as the small core CCD when Zen5 hits the road.
Found this in the Manual /PPR Vol 3 for AMD Family 19h Model 11h B1

There are 8 CCM links on the IOD, and similarly each CCD has 1 CCM link. --> Same like Milan.
However since each CCM has 2x GMI links now, they were able to connect more than 8 CCDs in narrow mode on Genoa. I would guess using only 6 CCMs on IOD.
So it seems natural that for 16 CCX , they just need to populate all the CCM links on IOD using narrow mode on CCD side.

There are up to 16 counters available that can be programmed to count events concurrently.
Each CCM can interface to either one or two Core/Cache Complex Dies (CCDs).

1668164722641.png

1668165884277.png
 

thigobr

Senior member
Sep 4, 2016
231
165
116
In theory they could create single CCD desktop chips with two GMI connections then. That would remove the write bottleneck from 7600/7700 but probably wouldn't change performance much for most workloads.
 

BorisTheBlade82

Senior member
May 1, 2020
660
1,003
106
We don't even know if Zen5c will keep the same structure: AMD could go for some more advanced 2.5D-3D die stacking there.
Of course. I was just trying to make sense of the rumour with Zen5+Zen4c. Zen5c will surely hit the market later than Zen5. So the combination would make sense if they introduced a new interconnect with Zen4c and used it for Zen5 as well.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
@DisEnchantment
I see. So according to Occam's razor the sIOD simply has 16 links which will be used for Bergamo. No fancy new advanced packaging.
I thought we would see some stacking with Bergamo, but it seems likely that it is just the same IO die as Genoa. They would only have 1 link per chiplet in that case, but it is kind of an in-between product; Zen 5 will almost certain you use stacking. Things get hard to predict with stacking in use. I have seen some rumors of infinity cache chips with only 16 MB of cache, which is strangely small. Those seemed to stack under a base die, so they likely would use EFB with micro-solder balls, like HBM. I don’t know if that has been updated. It might make sense that all GPUs will have an HBM like interface for direct connection too HBM or connection to an infinity cache chip which has cache and a GDDR memory controller. It is unclear how they would use that in other products.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Of course. I was just trying to make sense of the rumour with Zen5+Zen4c. Zen5c will surely hit the market later than Zen5. So the combination would make sense if they introduced a new interconnect with Zen4c and used it for Zen5 as well.
It seems like the most likely route is to either integrate low power Zen 4c cores into the IO die or stack them with the IO die. Then they could connect some Zen 5 cpu chiplets, so kind of like an APU except with GMI links or stacked bridges. That would allow them to use low power, on die, Zen 4/Zen 4c derivatives and only clock up the Zen 5 die(s) when needed. Would be interesting (but very odd) if a low power Zen 4C die could act as a bridge chip between the IO die and the Zen 5 chiplet. That would allow easy switching between them transparently. Some AMD patents seemed to indicate that low power cores would not be visible to the OS.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
So Thanks to Yuuki_AnS we have this information on Bergamo so we can compare it with Genoa.. He is using Windows Server OS, unlike most others that are using Linux, you will not find these screenshots in Phoronix or Serve The Home.

Genoa compared with Bergamo
GenoavsBergamo1.jpg

GenoavsBergamo2.jpg

GenoavsBergamo3.jpg


Genoa: Zen4, AVX-512, 32MiB per CCD, 12CCDs, 360W TDP, Total Cores: 96C/196T

Bergamo: Zen4c, AVX-512, 16MiB per CCD, 16CCD(or 8CCD with 2 8Core CCX?), 360W TDP, Total Cores: 128C/256T
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,483
14,434
136
It's DOA, Tomorrow you will find its Obituary/Review on Serve The Home, Phoronix websites. It will be one of the saddest day on Intel's History, its Highest profít maker, the Xeon brand outright emasculated..
And for those that say "well use 2 <can't remember the number> Xeon Platinum, they are close" remind then thats 700 watts minimum vs 360. In the server world that should be "game over" by itself.
 

BorisTheBlade82

Senior member
May 1, 2020
660
1,003
106
So Thanks to Yuuki_AnS we have this information on Bergamo so we can compare it with Genoa.. He is using Windows Server OS, unlike most others that are using Linux, you will not find these screenshots in Phoronix or Serve The Home.

Genoa compared with Bergamo
View attachment 74307

View attachment 74308

View attachment 74309


Genoa: Zen4, AVX-512, 32MiB per CCD, 12CCDs, 360W TDP, Total Cores: 96C/196T

Bergamo: Zen4c, AVX-512, 16MiB per CCD, 16CCD(or 8CCD with 2 8Core CCX?), 360W TDP, Total Cores: 128C/256T
8 CCD with 2x 8c CCX is almost certain.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Either way, game over for Intel.....
It's DOA, Tomorrow you will find its Obituary/Review on Serve The Home, Phoronix websites. It will be one of the saddest day on Intel's History, its Highest profít maker, the Xeon brand outright emasculated..
And for those that say "well use 2 <can't remember the number> Xeon Platinum, they are close" remind then thats 700 watts minimum vs 360. In the server world that should be "game over" by itself.
It would be much appreciated if you could refrain from such post in the Zen 4 Core Specifications Discussion thread.
There a lots of other threads for those, and easier for people interested only in the technicalities and architecture to avoid if they have no interest in such debates.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Most of it is known, but nice to have some official slides

Here is the answer to the significant frequency gains achieved, they are using HPC HD cells (2x2, 6T) with eLVT. Also the reason for the significant loss of density. Good news if they will be a bit more conservative with clocks targets on Zen 5, they can really claw back so much density.

1677968154871.png1677968266193.png
1677968658442.png


GMI3 IFOP data with lane count and official logic area without the scribe lines and filler.
1677968437358.png

Pretty much confirms what was known before, that the CCM has 2x GMI lanes.
1677968517708.png

Not much other stuffs that is not known already.

Such a breath of fresh air not having to digest "leaked information" for once.
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,934
7,619
136
Here is the answer to the significant frequency gains achieved, they are using HPC cells with eLVT. Also the reason for the significant loss of density.
This likely will be also a big part of the density gained in the Zen 4c cores. Going by these graphs those cloud cores may still be capable of frequencies 3% above Zen 3, but ~11% below the high frequency optimized Zen 4 cores.

Pretty much confirms what was known before, that the CCM has 2x GMI lanes.
Which furthermore adds the advantage of more redundancy for better yield.
 

Geddagod

Golden Member
Dec 28, 2021
1,147
1,003
106
Most of it is known, but nice to have some official slides

Here is the answer to the significant frequency gains achieved, they are using HPC HD cells (2x2, 6T) with eLVT. Also the reason for the significant loss of density. Good news if they will be a bit more conservative with clocks targets on Zen 5, they can really claw back so much density.

View attachment 77641View attachment 77642
View attachment 77647


GMI3 IFOP data with lane count and official logic area without the scribe lines and filler.
View attachment 77644

Pretty much confirms what was known before, that the CCM has 2x GMI lanes.
View attachment 77645

Not much other stuffs that is not known already.

Such a breath of fresh air not having to digest "leaked information" for once.
Zen 4 using HD rather than HP cells mostly was quite a surprise for me tbh.
How much density does eLVT cost? Couldn't find a figure. Besides, doesn't the graph say the 70% of the core is still the standard 5nm HD cells, with the specialized cells prob just being along the critical paths?
So I don't know if this means Zen 5 could claw back much density. Even if they go for no frequency gains or even a slight frequency drop, aren't wider architectures, which zen 5 will be, harder to clock higher as well?