Discussion Zen 4 Core Specifications Discussion

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
1661935926947.png
1661935956023.png



1664491299461.png

Some tidbits
  • A 15 layer Telescoping Metal stack has been co-optimized to deliver both high frequency and high density routing capability
This bodes well for density going forward, since they managed to increase frequency greatly without adding additional metal layers. Probably RDNA3 will hit in the same range for density ~90MTr/mm2 and probably blazing frequency if thermal hotspots can be taken care of.
They did add a lot more transistor to support AVX512/increasing ROB/L2/uop cache/BTB.

I bet the second GMI burnt a lot of space albeit probably a necessary forward looking block.

Zen5 will be a reset and optimize the core again a la Zen 3

Will be updated if more specs will show up. This time I doubt AMD will be more open
 
Last edited:

yuri69

Senior member
Jul 16, 2013
373
573
136
Zen 4 quite supports the thesis that balooning the L2 is wasteful. Twice the size for ~1% perf gain. Intel recently went from .5MB to the whole 1.25MB with Willow with so so results.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
One of the things that strikes me about modern chip design is how the increase in available computing power each generation influences our ability to improve the next. I remember my time in college working with verilog doing my first functional circuit designs and now I imagine how fast modern automated design software and the equipment that it runs on is and makes what I learned so trivial. They can probably run limitless design iteration tasks to generate better designs.
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
[Tweet] This thing suggest to me that they are using a lot of automatic Place and Route in contrast to Intel with a fairly stable floor plan for a quite a while. It seems like Mike Clark's subtle comment on AMD using automation and AI shows their telltale signs here. No way you can change so much of the floor plan every generation if done by hand without spending a ton of time and man hours.

Interesting. Throughout the years of following x86 CPU architecture, I've read that Intel relies much more on manual layout than AMD. If I remember correctly, AMD's transition to automated tools started with ex-CEO Dirk Meyer pushing the design methodology transition after the acquisition of ATI. I seem to recall there was talk about some pushback from the senior designers. Many have trouble adapting to new ways of working, I suppose, and there were downsides as well. However, the transition looks to have paid off for AMD in a big way. Jim Keller has stated in his interviews that much of his work at Intel revolved around methodology. Intel CEO Pat Gelsinger has brought back many of the old guard though, so it will be interesting to see how that will affect how Intel does things going forward.

PS. I'm reposting this image from the other thread, for those, like me, that do not have a Twitter account. What does "CPL" mean?

fbsb9oxwiaehfjn-jpeg.67062
 
Last edited:

arcsign

Junior Member
Jul 26, 2009
8
26
91
Sorry, I wasn't telling you to wait for no reason. I know already where some of the die area went because of the testing done by a developer of a certain application, and I'm saying you'll hear about it soon. Probably. Depends on when AMD let them release their findings.

Ah, interesting! I will just have to look forward to finding out more later then.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
shouldn't be a surprise given AMD took the exact same approach with Zen 2 and 3.
That's me. To me that approach makes perfect sense, profit from the power efficiency (much more important for mobile/server/MT) of mobile libraries from the start and try to reach high frequencies (important for mindshare and bleeding edge competitiveness) from there. Should be much easier to execute and achieve both that way than the other way around really.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Since the speculation thread is essentially turning into a review thread I wanted to salvage this deep dive into Zen 4's AVX512. I'm sure we can get some mileage out of it for this thread.


Edit: Credit where credit's due, thanks to @uzzi38 for originally sharing that link (took me some time to find it again).
 
Last edited:

Abwx

Lifer
Apr 2, 2011
10,847
3,297
136
According to Computerbase in MT Zen 4 has 12% better IPC than Zen 3 and 2% better than ADL.

Improvements vs Z3 is quite homogeneous with 9% in DigiCortex, 12% in 7 ZIP, 13% in CB R20, 15% in Handbrake FI when comparing the 5800X/7700X since the 7950X benefit from the RAM bandwith advantage vs the 5950X.

 

itsmydamnation

Platinum Member
Feb 6, 2011
2,743
3,072
136
Since the speculation thread is essentially turning into a review thread I wanted to salvage this deep dive into Zen 4's AVX512. I'm sure we can get some mileage out of it for this thread.


Edit: Credit where credit's due, thanks to @uzzi38 for originally sharing that link (took me some time to find it again).
So this is awesome and we have our answer about 256 bit boundary crossing , AMD made a 512bit wide shuffle unit. Zen4's AVX-512 implementation looks proper good not just a tick box/ after thought!
 

thigobr

Senior member
Sep 4, 2016
231
165
116
The SOC topology picture on #35 still lists 32B/cycle read CCD/Infinity Fabric and 16B/cycle write... I am not sure how Single CCD models can use DDR5 full bandwidth!

32B/cycle at 2GHz IF is just 64GB/s and 16B/cycle writes means just 32GB/s... While DDR5 at 5200MHz 128bit bus is supposed to offer 83GB/s!

On that note: AIDA64 Memory benchmark numbers are probably way wrong or not testing what it supposed to test... How come the 7700X write score hits 80GB/s?!
Guru3D 7700X memory scaling article
 
  • Like
Reactions: BorisTheBlade82

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
With the launch AMD shared some slides with some numbers that aren't in the OP yet (like L2 BTB being increased slightly to 7k):
Thanks mate, I have updated the original post.
All in all, Zen 4 seems like a minor bump in most resources. It is obvious that this is conscious design choice since they knew they can extract the perf from frequency.

Turn down top frequency of 7950X by ~300 MHz and you get a ~5.4 GHz chip that can sustain that frequency indefinitely with regular air coolers and a base frequency about 1 GHz higher than its predecessor and that too with AVX 512. Interesting thing is that AMD managed to use the same Zen 3 core to stay competitive with current competitor offerings with less resources almost everywhere, PRF, ROB, FPU pipes etc.

Phoenix on N4 should be more interesting than desktop Zen 4
 

eek2121

Platinum Member
Aug 2, 2005
2,904
3,906
136
The SOC topology picture on #35 still lists 32B/cycle

You are taking that out of context.

  • First, look again.
  • Second, understand the role of CPU cache. Although your original comment is wrong, I'll counter it with a question, what is the role of L1/L2/L3 cache in a CPU (or practically any compute die for that matter)?
 

BorisTheBlade82

Senior member
May 1, 2020
660
1,003
106
@eek2121
I am having a hard time understanding your hints, so could you please elaborate.
Let's concentrate on the write speed: If the IFoP had 16byte/cycle then this will give 16byte/cycle * 2Ghz = 32Gbyte/s. Even if I assume their topology picture to be wrong as there seem to be two ports per link starting with Zen 4, that would only give 64Gbyte/s.

So I can only interpret your comment in the way that AIDA does actually not measure RAM bandwidth but rather a mix of RAM and Cache? But then how come that the numbers align pretty well with the RAM used?
 

thigobr

Senior member
Sep 4, 2016
231
165
116
Same here... I understand how caches work and how they reduce the need for memory bandwidth on top of reducing a lot the perceived latency. And I am not saying that the interface is hurting the core performance because the performance is clearly there and well!

I am talking specifically about the interconnect. Maybe there are multiple ports per CCD? Or some kind of data compression? I am just trying to understand if there's any point in chasing higher memory frequencies for this generation unless that also brings latency down. Because the bandwidth alone seems to be more than enough
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136

BorisTheBlade82

Senior member
May 1, 2020
660
1,003
106
So the second part of the CnC deep dive is now online:

They take a look at the question we already discussed about the IFoP bandwidth (or the lack of it) constraining DRAM bandwidth:

Firstly, the memory controller does not seem to be able to take full advantage of 96 Gbyte/s DDR5-6000. Even with a 2 CCD SKU they only get around 73 Gbyte/s read speed.
Secondly, with write they run into the IFoP limit of 2x32Gbyte/s.
So a single CCD SKU should effectively be limited to 64Gbyte/s read and 32Gbyte/s write and AIDA is basically rubbish :oops: