Question Zen4c vs E core Die area.

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Zen4c is an impressive feat of engineering. Four of these small Full P cores are as big as a cluster of 4 e cores(Raptormont cluster).

Screenshot_20230614-101750_Chrome.jpg


OG Gracemont cluster, the Raptormont cluster should be just a hair short of 10 mm^2

07bc17b5-600a-4621-8035-edd0b1e89c63_1185x636.jpg



Zen4c has higher IPC and SMT so Zen4c is the new Performance/Area King
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,105
136
Zen4c is an impressive feat of engineering. Four of these small Full P cores are as big as a cluster of 4 e cores(Raptormont cluster).

View attachment 81724


OG Gracemont cluster, the Raptormont cluster should be just a hair short of 10 mm^2

View attachment 81725



Zen4c has higher IPC and SMT so Zen4c is the new Performance/Area King
Given there's a full node difference between the two, you'd kind of expect an advantage. Process normalized is another matter. Especially if you have per thread QoS requirements. Many Bergamo deployments will probably have SMT disabled for that reason.
 

Saylick

Diamond Member
Sep 10, 2012
3,882
9,015
136
While Zen 4c should have higher IPC/mm2 than Gracemont, Zen 4c doesn't clock as high. Overall performance per core might be closer than you think if you factor in clock differences. We'll have a better idea when the nodes are closer together, too, via Meteorlake.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
While Zen 4c should have higher IPC/mm2 than Gracemont, Zen 4c doesn't clock as high. Overall performance per core might be closer than you think if you factor in clock differences. We'll have a better idea when the nodes are closer together, too, via Meteorlake.
I am getting confirmation, but 4MiB Crestmont Cluster should be 7.4 mm^2, the IPC improvement from Gracemont should be in single digits, taking this into accout should still allow Zen4c still ahead in Performance/area as 4c/8T Zen4c will be faster than 4C/4T Crestmont.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,105
136
the IPC improvement from Gracemont should be in single digits
I don't believe we've had leaks about Crestmont IPC. And as Saylick mentioned, you need to factor in clock speeds (iso-process, at that), not just IPC. That's a weakness for Atom vs Core and full sized Zen, but that's also Zen 4c's major penalty vs Zen 4. Need more data there, but hopefully with Meteor Lake and Bergamo out soon-ish, we'll have something to work with. Though N4P is still the better node.
 

BorisTheBlade82

Senior member
May 1, 2020
688
1,085
136
The mont-cores won't clock as high in Siera Forest as in ADL/RPL. The same could be true the other way around for Zen Nc cores in future client products.
 

Geddagod

Golden Member
Dec 28, 2021
1,340
1,433
106
I am getting confirmation, but 4MiB Crestmont Cluster should be 7.4 mm^2, the IPC improvement from Gracemont should be in single digits, taking this into accout should still allow Zen4c still ahead in Performance/area as 4c/8T Zen4c will be faster than 4C/4T Crestmont.
4 core Crestmont Cluster with 3MB L2 is 5.91mm^2. Generously scaling the L2 cache data array and L2 tags for the 4MB you want, would give you 6.2 mm^2.
This would give you ~1.6 mm^2 per core (this should be an overestimate due to the 4 core interconnect being part of the area calc).
Area breakdown went something like this:
Core: 1.05 mm^2
Est 4MB L2 data array (Scaled 4:3 3MB L2 data array) : 1.15
L2 CTL, CLK, Interconnect: 0.5
L2 tags (scaled 4:3): 0.26
Assuming that Crestmont IPC is ~skylake as a worst case (no real gain over gracemont), and regarding skylake as a base line of let's say 1, we would get 1/1.6 = 0.625 IPC per mm^2.
*note that the IPC per mm^2 is not actual IPC but just used as a relative scaling factor lol.
Zen 4 being ~40% higher IPC than skylake (Raichu IPC testing, Zen 4 similar IPC to GLC, 50% weight for spec FP and spec INT) would give us 1.4/2.48 = 0.565 IPC per mm^2.
Someone should prob check my math but I think I did it all correctly.
 
  • Like
Reactions: lightmanek

Geddagod

Golden Member
Dec 28, 2021
1,340
1,433
106
Clocks are anyone's guess, so I just did IPC, but even if we did know clocks I don't think that would tell the whole story because we don't know power draw per performance for both crestmont and zen 4c. The best comparison, IMO, would be to limit both cores to a certain PWR, and measure performance from there, but we don't have that info yet.
I still expect zen 4c to have higher perf/mm^2, simply because how sucky Intel has been with frequency iso power in their recent cores, but we will have to see ig.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Screenshot_20230614-151004_Chrome.jpg

Due to power constrains those e cores will not be as fast as the desktop parts, 3 Ghz should be the max all core sustained speed on the larger SKU. Bergamo was clocked at 2.8 Ghz on a V-Ray All core benchmark.
 
  • Like
Reactions: Kaluan

Geddagod

Golden Member
Dec 28, 2021
1,340
1,433
106
View attachment 81738

Due to power constrains those e cores will not be as fast as the desktop parts, 3 Ghz should be the max all core sustained speed on the larger SKU. Bergamo was clocked at 2.8 Ghz on a V-Ray All core benchmark.
I'm also using Locuza lmao
https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F6088a70e-2300-4741-ae91-37abb0ee2d8d_1022x185.png

But Idk where Locuza is getting 4MB L2 is going to be 6.5mm^2. Maybe he included the power gate stripe on his estimation, which I didn't.
You can try scaling it yourself like I did, it should not be that large. And again, mine was an overestimation. The core die shot is on Dylan's MTL die shot on SemiAnalysis.
I was not aware the power gate stripe was not included on Zen 4C, good catch, so the core without the power stripe for Zen 4C is 2.37 mm^2
https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F741b5e8f-787f-430d-813f-77882cc48724_673x382.png

Making the IPC/mm^2 0.591 for Zen 4C in comparison to 0.625 for Crestmont.
 
  • Like
Reactions: Kryohi

Abwx

Lifer
Apr 2, 2011
11,783
4,691
136
While Zen 4c should have higher IPC/mm2 than Gracemont, Zen 4c doesn't clock as high. Overall performance per core might be closer than you think if you factor in clock differences. We'll have a better idea when the nodes are closer together, too, via Meteorlake.

With 40% better IPC perf wont be close, beside high clock mean high power comsumption, wich defeat the efficency purpose.
 
  • Like
Reactions: lightmanek

Abwx

Lifer
Apr 2, 2011
11,783
4,691
136
Zen 4C has 40% higher IPC than Crestmont? Where did you get this figure?

Alder lake P core has 43% better IPC than its e core, and Zen 4 has comparable IPC than a P core.



 
Last edited:
  • Like
Reactions: Kaluan

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Alder lake P core has 43% better IPC than its e core, and Zen 4 has comparable IPC than a P core.
It depends on the workload, in FPU benchmarks the difference can be up to 66%(at ISO Speed and even greater at stock) On INT workloads the difference is about 25%

From Raichu.

Fclr_FKakAA3ygF.png


With this data we can conclude that Bergamo being about the same size as Raptormont has an advantage in Performance/Area metric
 

Abwx

Lifer
Apr 2, 2011
11,783
4,691
136
It depends on the workload, in FPU benchmarks the difference can be up to 66%(at ISO Speed and even greater at stock) On INT workloads the difference is about 25%

From Raichu.

View attachment 81752

Even in apps that yield only 26% getting the same ST perf would require 26% higher frequency and roughly 65% higher power at isoprocess, and in our case the e cores are at a process disadvantage wich would push power over a 2x ratio.

Edit : There s also SMT to consider, throughput wise that s simply not comparable, Zen 4c has some grunt left when throughput reach its limit at 128T.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,452
3,105
136
With 40% better IPC perf wont be close, beside high clock mean high power comsumption, wich defeat the efficency purpose.
It is incorrect to assume that Crestmont has no improvement over Gracemont. And the relationship is more complicated between clocks and power. You'll see what I mean when Zen 4c reviews drop.
Edit : There s also SMT to consider, throughput wise
You're picking odd extremes to use for comparison. Neither purely ST nor embarrassingly parallel represent the target market for these products well. They're meant for cheap/efficient cloud processing. In practice, this means there'll be minimums for per thread performance that might set a floor for Atom clocks and force the disabling of SMT for Bergamo. ST sensitive workloads, meanwhile, will continue to target the big core products.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,958
15,931
136
It is incorrect to assume that Crestmont has no improvement over Gracemont. And the relationship is more complicated between clocks and power. You'll see what I mean when Zen 4c reviews drop.

You're picking odd extremes to use for comparison. Neither purely ST nor embarrassingly parallel represent the target market for these products well. They're meant for cheap/efficient cloud processing. In practice, this means there'll be minimums for per thread performance that might set a floor for Atom clocks and force the disabling of SMT for Bergamo. ST sensitive workloads, meanwhile, will continue to target the big core products.
There is no product from Intel out there with nothing but e-cores, so how can we compare ???? You would need like a 100-120 ecore chip to make any real comparison. Even in my 9654 and my 9554 there appears to be big differences in clock speed and performance, as well as heat. I am sure the heat is the fact that the 9554 is running much faster.

Of have I missed that chip ??
 

Abwx

Lifer
Apr 2, 2011
11,783
4,691
136
It is incorrect to assume that Crestmont has no improvement over Gracemont. And the relationship is more complicated between clocks and power.

That s an hollow statement because you dont know the physics at work.

For someone who want to understand what it is about when it comes to frequency and power he should first understand what it is about here :


You're picking odd extremes to use for comparison. Neither purely ST nor embarrassingly parallel represent the target market for these products well. They're meant for cheap/efficient cloud processing. In practice, this means there'll be minimums for per thread performance that might set a floor for Atom clocks and force the disabling of SMT for Bergamo. ST sensitive workloads, meanwhile, will continue to target the big core products.

I m taking a best case figure for the e cores, even at only 20% IPC difference there s roughly 50% more power to get the same ST perf, and SMT does change nothing to the equation because power goes proportionaly with augmented throughput if frequency is kept constant.

Also it s about sure that Zen 4c use more power constrained libraries, so it should be a little more efficient than Zen 4 at same low frequencies.
 
Last edited:

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
There is no product from Intel out there with nothing but e-cores, so how can we compare ???? You would need like a 100-120 ecore chip to make any real comparison. Even in my 9654 and my 9554 there appears to be big differences in clock speed and performance, as well as heat. I am sure the heat is the fact that the 9554 is running much faster.

Of have I missed that chip ??
We are making educated guesses with available data, so far Zen4c has proven to have the same IPC than Zen4, Gracemont needs a pretty hefty IPC gain to even come close.

The fact that a Single 9754 is as fast as two 9554 in V-Ray and other benchmarks its proof that Zen4c is the King at Performance/Area
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,958
15,931
136
We are making educated guesses with available data, so far Zen4c has proven to have the same IPC than Zen4, Gracemont needs a pretty hefty IPC gain to even come close.

The fact that a Single 9754 is as fast as two 9554 in V-Ray, Blender and other benchmarks its proof that Zen4c is the King at Performance/Area
I was more arguing that exists50 appeared to be saying that the Intel e-cores are more powerful than we think. Having 2 different Genoa systems (soon to be 3, tomorrow), I can tell you that I am amazed at the performance, and since Zen 4c is only less cache, I can believe anything. In my case, I need the cache for quite a few of my workloads.

Edit: Not to mention that e-cores can not do avx-512, where Genoa cores CAN. And there are places where I use it, and I am sure even in the cloud it can come in handy. Just for the DC crowd, they rent cloud resources all the time, and those that support avx-512 are way more desirable.
 
Last edited:
  • Like
Reactions: lightmanek

Geddagod

Golden Member
Dec 28, 2021
1,340
1,433
106
We are making educated guesses with available data, so far Zen4c has proven to have the same IPC than Zen4, Gracemont needs a pretty hefty IPC gain to even come close.
I think you mean 'Crestmont' needs a pretty hefty IPC gain to even come close.
But IPC isn't everything, and for overall perf I don't think crestmont needs to 'come close' to Zen 4 IPC to be competitive with Bergamo at the very least
My last couple messages have just shown how Crestmont has a marginal IPC per mm^2 advantage over Zen 4C... assuming that crestmont has no ipc gains what so ever over gracemont.
Obviously clock speeds are the other significant factor here, and depends on the architecture and how well Intel 4 can clock at low power, but I don't think Crestmont needs a hefty IPC gain to come close to overall perf/area. If they clock similarly to Zen 4C at the same power, they should be competitive in perf/area, simply because of how area efficient they are.
The fact that a Single 9754 is as fast as two 9554 in V-Ray, Blender and other benchmarks its proof that Zen4c is the King at Performance/Area
Also the only bergamo benchmarks I have seen so far is Vray.
 

Abwx

Lifer
Apr 2, 2011
11,783
4,691
136
I was more arguing that exists50 appeared to be saying that the Intel e-cores are more powerful than we think. Having 2 different Genoa systems (soon to be 3, tomorrow), I can tell you that I am amazed at the performance, and since Zen 4c is only less cache, I can believe anything. In my case, I need the cache for quite a few of my workloads.

Edit: Not to mention that e-cores can not do avx-512, where Genoa cores CAN. And there are places where I use it, and I am sure even in the cloud it can come in handy. Just for the DC crowd, they rent cloud resources all the time, and those that support avx-512 are way more desirable.

E cores can be as powerfull as they want, this wont bridge the gap due to IPC and process advantage.

In the Vray benches one 128C Zen 4C has a 360W TDP or so, the two Intel 60 cores 8490H have a TDP of 350W each, that s 25% better perf for Bergamo at half the TDP.

E cores wont have a better efficency than the P cores of the 8490H at same throughput and threads, they should even be less efficient because P cores work at quite lower frequency and this more than compensate the core size, this time what was convenient for ADL and RPL wont work for big core count servers.
 

repoman27

Senior member
Dec 17, 2018
384
540
136
I don't believe we've had leaks about Crestmont IPC. And as Saylick mentioned, you need to factor in clock speeds (iso-process, at that), not just IPC. That's a weakness for Atom vs Core and full sized Zen, but that's also Zen 4c's major penalty vs Zen 4. Need more data there, but hopefully with Meteor Lake and Bergamo out soon-ish, we'll have something to work with. Though N4P is still the better node.
Isn't Bergamo on AMD's flavor of N5HPC though, not N4P?