Solved! Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 120 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

What do you expect with Zen 4?


  • Total voters
    330
  • Poll closed .

eek2121

Platinum Member
Aug 2, 2005
2,296
3,004
136
@eek2121
Sorry, but most of your post is bollocks. Servers and also Mobile is much more important to AMD than DIY. Everything else is just wishful thinking from your side.
Furthermore it is a FACT that the IFOP interconnect needs much more energy compared to monolithic solutions. This is why everyone and their dog is talking about silicon interconnects.
Sorry, I was in the hospital for surgery and am behind on posts, but I can prove my point pretty easily:

On my motherboard, with fixed DDR4 3600, no changes to CPU frequencies:

FCLK 1800 MHz, package power: 48W
FCLK 933 MHz, package power: 28.5W

Half the power from simply reducing FCLK, to say nothing of other optimizations that can be done. Honestly I may play around with it when I am feeling a bit better, but note that mobile Ryzen already implements this.

Honestly, if you needed 16 cores in a 45W envelope, it is doable now. You just have to drop the fclk.
 

wlee15

Senior member
Jan 7, 2009
313
31
91
The whole Zen family so far hasn't seen a change in its L1$ quantity of 64KiB. Only change so far was how its use is partitioned in Zen 2. So far we heard Zen 4 will increase L2$. But a L1$ change is more delicate so I'd deem it very unlikely an optimization step like Zen 4c will contain a big change there.
Zen 1 has 64KB of L1 instruction cache instead of 32KB that the rest of the line has.
 

Joe NYC

Golden Member
Jun 26, 2021
1,024
1,168
96
While the ability is applaudable, I'm personally not sure Intel really deserves kudos for their approach. Especially PVC appears to be overenginered, SPR doesn't strike me as particularly elegant either.
SPR is most like Zen 1 MCM in its approach to partitioning, so quite lagging the leading edge. And EMIB is described as a generation behind EFB, may have some yield losses.

And PVC, completely agree. Its as if the engineers got the problem, came back with the solution, and instead of management telling them to:
"Good first attempt. Now go back and redo it like we are doing this for real, like we are going to manufacture this thing."

The management said:
"Yeah, let's go with this byzantine design, let's see what happens when we connect all the pieces together"
 

Joe NYC

Golden Member
Jun 26, 2021
1,024
1,168
96
I hit up a couple of the usual leak sites and didn't see this. Twitter leaker?
Just by looking at AMD presentations, Zen 4 desktop seems way below the priority of Genoa. There was one slide showing server roadmap, with Zen 4 Genoa on it, the same presentation for desktop ended with 2021 with Zen 3 (did not even show Zen 4 at all... So desktop is a low priority, which IMO is a mistake that AMD is making...

One tidbit that gave people hope about maybe a mid-year introduction of Zen 4, based on target date for AM5 motherboards.

But now that we know that Rembrandt (Zen 3+ ) will be on AM5, it increasingly looks like Zen 4 will be late 2023.
 

Manabu

Junior Member
Jun 25, 2008
9
10
81
Did AMD mention using an SLC?
Sorry, I don't know what SLC means in this context. Single Level Cache?

But a L1$ change is more delicate so I'd deem it very unlikely an optimization step like Zen 4c will contain a big change there.
That is why I prefaced with reduction in FP compute power compared with the vanilla Zen4 core, speculated here. If that change is in the tables, then an L1 amount change might also be, as both means changing the floor plan of the core, unlike changes in the L2 and L3 cache that are adjacent structures who can be expanded or contracted with relativelly little design effort.

The Zen4c cores are rumored to be used as "little" cores in Zen5, and maybe beyond, especially on lower power and cost applications, like the Zen2 core is still used now. Given this longevity, it might make sense for AMD to expend the extra engineering effort optimizing it.

Doubling or trippling the L1 cache while keeping the same latency measured in cycles would provide a pretty sizeable IPC gain, like the 3D cache will do with L3 cache, but unlike an boost in L3 cache it will probably affect the vast majority of applications.
 
  • Like
Reactions: Tlh97

DrMrLordX

Lifer
Apr 27, 2000
20,519
9,606
136
Honestly, if you needed 16 cores in a 45W envelope, it is doable now. You just have to drop the fclk.
Adding more cores and reducing IF clockspeed would be disastrous.

Sorry, I don't know what SLC means in this context. Single Level Cache?
SLC is usually System Level Cache - an alternative Last Level Cache that can be utilized by more than one logic group. For example, the SLC on Apple SoCs can be used by the CPU cores, iGPU, and um other stuff too.
 

BorisTheBlade82

Senior member
May 1, 2020
404
573
106
On my motherboard, with fixed DDR4 3600, no changes to CPU frequencies:

FCLK 1800 MHz, package power: 48W
FCLK 933 MHz, package power: 28.5W
What is the point you are trying to sell? That IFoP needs less energy when slowed down? Well, that is why energy consumption of interconnects is measured in pJ/bit. The point is: At the same speed IFoP will consume at least 10x more energy than the same interconnect on die. This is due to physics. Silicon bridges will bring down that gap by a lot.
 

eek2121

Platinum Member
Aug 2, 2005
2,296
3,004
136
Adding more cores and reducing IF clockspeed would be disastrous.



SLC is usually System Level Cache - an alternative Last Level Cache that can be utilized by more than one logic group. For example, the SLC on Apple SoCs can be used by the CPU cores, iGPU, and um other stuff too.
What is the point you are trying to sell? That IFoP needs less energy when slowed down? Well, that is why energy consumption of interconnects is measured in pJ/bit. The point is: At the same speed IFoP will consume at least 10x more energy than the same interconnect on die. This is due to physics. Silicon bridges will bring down that gap by a lot.
It was claimed that chiplet based CPUs in general couldn’t possibly be a thing because a chiplet based solution uses too much power. I pointed out that a dynamic fclk and other optimizations could very much allow for a true mobile version of the 5950, or any other chiplet based CPU. Just because AMD hasn’t doesn’t mean they can’t. Could you target the ultralight 15W with a chiplet approach? Not as implemented. 35-54W however? absolutely.

The whole reason renior/cezanne/etc exists is because AMD wanted to build a smaller 4-6-core chip for laptops, and they found they could actually squeeze in 8 cores with no real increase in power budget.

We can expect monolithic designs to go away completely (even for 15W…in time) as AMD addresses the pitfalls, unless Intel forces a use case that requires AMD to adapt in some way.

EDIT: A low fclk is not disastrous. The key is only ramping fclk when it is needed, and idling it when not.
 
  • Like
Reactions: Tlh97 and Thibsie

eek2121

Platinum Member
Aug 2, 2005
2,296
3,004
136
Hoping for same socket as AM4 for Rembrandt!
In case there is confusion:

  • Rembrandt is AM5.
  • Rembrandt desktop launches Q3 of next year or later.
  • Rembrandt mobile is rumored for a CES launch. Given AMD cadence, that seems likely.
  • AM4 will receive Zen3D along with a new Zen 3 stepping, which depending on who you believe (AMD or the leakers) may or may not have clock speed improvements
  • That is it for AM4.
With the release of ADL-S I suspect some of the info surround the B2 stepping, Zen3D, etc. may be stale.
 

moinmoin

Diamond Member
Jun 1, 2017
4,193
6,279
136
Zen 1 has 64KB of L1 instruction cache instead of 32KB that the rest of the line has.
Right, as I wrote with Zen 2 AMD re-partitioned the whole of L1$. Halving the L1 instruction cache and doubling the µOP cache.

That is why I prefaced with reduction in FP compute power compared with the vanilla Zen4 core, speculated here. If that change is in the tables, then an L1 amount change might also be, as both means changing the floor plan of the core, unlike changes in the L2 and L3 cache that are adjacent structures who can be expanded or contracted with relativelly little design effort.
Hm, for PS5 Sony did let Zen 2 customize with a heavily cut FP capability, so that's indeed a possibility. L1$ is yet another level lower though imo. Changes there are usually long planned and simulated for new core designs as changes in balance change everything about cores.

Doubling or trippling the L1 cache while keeping the same latency measured in cycles would provide a pretty sizeable IPC gain, like the 3D cache will do with L3 cache, but unlike an boost in L3 cache it will probably affect the vast majority of applications.
If that can be as easily be done as said it would have happened already. Every increase in cache size is bound to increase latency. It's all a huge balancing act.

Doubling or tripling the L1$ while keeping the same latency measured in cycles is very very unlikely to happen, what you could argue for is doubling or tripling while keeping the same latency measured in nanoseconds, allowing overall speed to salvage some of the increased latency.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
20,519
9,606
136
EDIT: A low fclk is not disastrous. The key is only ramping fclk when it is needed, and idling it when not.
That only helps you when you're not doing anything. Once you put load on the CPU and bring the IF clock back up to where it needs to be, your power consumption jumps. And when you've got a limited power budget, suddenly bumping up the share of ppt you commit to interconnect potentially strangles performance elsewhere in the CPU. Depending on your limits, of course.

Yes, the atrociously high idle power consumption of existing IF implementations is something AMD would need to tackle to bring the I/O die (or similar) strategy to mobile. And maybe they'll fix that with Raphael-H. But if your power constraints are 65w or lower . . . do you really want to burn an extra 10W (or more) on interconnect? That's going to choke the rest of the chip, even in bursty loads.
 

eek2121

Platinum Member
Aug 2, 2005
2,296
3,004
136
That only helps you when you're not doing anything. Once you put load on the CPU and bring the IF clock back up to where it needs to be, your power consumption jumps. And when you've got a limited power budget, suddenly bumping up the share of ppt you commit to interconnect potentially strangles performance elsewhere in the CPU. Depending on your limits, of course.

Yes, the atrociously high idle power consumption of existing IF implementations is something AMD would need to tackle to bring the I/O die (or similar) strategy to mobile. And maybe they'll fix that with Raphael-H. But if your power constraints are 65w or lower . . . do you really want to burn an extra 10W (or more) on interconnect? That's going to choke the rest of the chip, even in bursty loads.
The question is, “can you afford to burn 10W”. The answer is: yes. We are stuck thinking in terms of Zen3, Raphael will have a GPU on the IO die. I postulate that the power improvements ported from cezanne combined with a 6nm IO die shrink along with (lp)DDR5 support will significantly reduce power consumption. 45W Raphael will not only exist, but it will thrive.

Also, 45W Raphael does exist, whether anyone wants it to or not.
 

Harry_Wild

Senior member
Dec 14, 2012
813
144
106
In case there is confusion:

  • Rembrandt is AM5.
  • Rembrandt desktop launches Q3 of next year or later.
  • Rembrandt mobile is rumored for a CES launch. Given AMD cadence, that seems likely.
  • AM4 will receive Zen3D along with a new Zen 3 stepping, which depending on who you believe (AMD or the leakers) may or may not have clock speed improvements
  • That is it for AM4.
With the release of ADL-S I suspect some of the info surround the B2 stepping, Zen3D, etc. may be stale.
Yeah, mess I up here, Zen+ 3D is what I was thinking! Coming out in January 2022!
 

Mopetar

Diamond Member
Jan 31, 2011
7,123
4,591
136
It's not like AMD gets anywhere near that $300/$500 from retail customers. Retailers get a big chunk, distributors take a chunk...
I realize this and it only helps my argument, but for the sake of simplicity I chose to ignore it. Even without considering that it requires a significant discount to make desktop parts favorable even at their list price.
 
  • Like
Reactions: Tlh97 and moinmoin

jamescox

Senior member
Nov 11, 2009
593
1,029
136
Sorry, I don't know what SLC means in this context. Single Level Cache?


That is why I prefaced with reduction in FP compute power compared with the vanilla Zen4 core, speculated here. If that change is in the tables, then an L1 amount change might also be, as both means changing the floor plan of the core, unlike changes in the L2 and L3 cache that are adjacent structures who can be expanded or contracted with relativelly little design effort.

The Zen4c cores are rumored to be used as "little" cores in Zen5, and maybe beyond, especially on lower power and cost applications, like the Zen2 core is still used now. Given this longevity, it might make sense for AMD to expend the extra engineering effort optimizing it.

Doubling or trippling the L1 cache while keeping the same latency measured in cycles would provide a pretty sizeable IPC gain, like the 3D cache will do with L3 cache, but unlike an boost in L3 cache it will probably affect the vast majority of applications.
The Zen 4c core is almost certainly a completely new floor plan. The cores will be smaller. The only way they could keep the floor plan would be if the cores are half the size (edit: accidental post: meant exactly half and they share the L2). They are likely making use of the denser design libraries, (edit) so it seems like it needs to be a completely different floor plan.

The question is, “can you afford to burn 10W”. The answer is: yes. We are stuck thinking in terms of Zen3, Raphael will have a GPU on the IO die. I postulate that the power improvements ported from cezanne combined with a 6nm IO die shrink along with (lp)DDR5 support will significantly reduce power consumption. 45W Raphael will not only exist, but it will thrive.

Also, 45W Raphael does exist, whether anyone wants it to or not.
The 6 nm IO die and updated interfaces could reduce power usage significantly, so I think a chiplet based mobile part will actually do quite well. It is just one link compared to the massive number on Epyc IO die. I assume that a major goal of the new Genoa IO die was to reduce power consumption. The really low power design will be a stacked solution, but that comes later. They will still probably use monolithic die and stacked solutions for lower power parts. Different types of stacking has been used in mobile for a while. Some of the new chip stacking tech can get close to a monolithic die for power consumption, so all mobile parts will probably be in a stacked package of some kind going forward. It would be great if we can get something like a 16 core processor (possibly Zen 4c based cores), a reasonable GPU, and a stack of HBM2E cache. A single stack is 16 GB now. If you had some LPDDR5 or even a ridiculously fast SSD to back it up, then that is plenty.
 
Last edited:

PJVol

Senior member
May 25, 2020
211
180
86
I assume that a major goal of the new Genoa IO die was to reduce power consumption.
Looking at its block diagram, it seems unlikely to be reduced, considering increased speed of the LCLK and SHUB clock domains, added new MP-based IO controller, two IO hubs and so on, unless some revolutionary powersaving tech's were implemented.
 
  • Like
Reactions: BorisTheBlade82

moinmoin

Diamond Member
Jun 1, 2017
4,193
6,279
136
The Zen 4c core is almost certainly a completely new floor plan. The cores will be smaller. The only way they could keep the floor plan would be if the cores are half the size They are likely making use of the denser design libraries.
My thoughts: New floor plan obviously. New balance of existing elements (like the changes between Zen and Zen 2) maybe. New core design less likely. Denser design library rather unlikely since Zen 2 already used the library focused on density and there is no indication AMD approached Zen 3 and Zen 4 differently. What I expect to change is that the longer time is being used to simulate and optimize the existing design in a denser layout, but with all ingredients being the same (so known quantities, important to optimize the hell out of them). The existing designs use the dense library to have a fine grid pattern on which to space out the transistors. The APUs were efforts to reduce the spacing again afterward, Zen 4c should be in line with that, with a high margin market added to the effort.

and a stack of HBM2E cache.
As the name says HBM is high bandwidth memory. It lacks the low latency to be really useful as cache.

Looking at its block diagram, it seems unlikely to be reduced, considering increased speed of the LCLK and SHUB clock domains, added new MP-based IO controller, two IO hubs and so on, unless some revolutionary powersaving tech's were implemented.
The current IODs essentially are always full on. In the APUs AMD tweaked the uncore to use lower power modes where and whenever it makes sense. The Genoa/Raphael IOD will be the first new IOD where such IO power saving techniques will be implemented for server/desktop packages as well. The Raphael-H rumors/leaks point to that step happening at least for the Raphael cIOD, and there is no reason it wouldn't be implemented in the Genoa sIOD as well then.
 

PJVol

Senior member
May 25, 2020
211
180
86
The current IODs essentially are always full on. In the APUs AMD tweaked the uncore to use lower power modes where and whenever it makes sense. The Genoa/Raphael IOD will be the first new IOD where such IO power saving techniques will be implemented for server/desktop packages as well. The Raphael-H rumors/leaks point to that step happening at least for the Raphael cIOD, and there is no reason it wouldn't be implemented in the Genoa sIOD as well then.
If all they have (i hope not) is already implemented in Cezanne, then it's not much to say the least (having both cezanne and vermeer, i may share my thoughts regarding fabric power efficiency, if you want). Rather, i hope, they use specifically optimized 6nm process for the IODs and of similar purpose circuitries. Besides, there are some very innovative powersaving tech's patented in a last 3 or more years, that hopefully were already applied at the design stage.
 

jamescox

Senior member
Nov 11, 2009
593
1,029
136
My thoughts: New floor plan obviously. New balance of existing elements (like the changes between Zen and Zen 2) maybe. New core design less likely. Denser design library rather unlikely since Zen 2 already used the library focused on density and there is no indication AMD approached Zen 3 and Zen 4 differently. What I expect to change is that the longer time is being used to simulate and optimize the existing design in a denser layout, but with all ingredients being the same (so known quantities, important to optimize the hell out of them). The existing designs use the dense library to have a fine grid pattern on which to space out the transistors. The APUs were efforts to reduce the spacing again afterward, Zen 4c should be in line with that, with a high margin market added to the effort.


As the name says HBM is high bandwidth memory. It lacks the low latency to be really useful as cache.


The current IODs essentially are always full on. In the APUs AMD tweaked the uncore to use lower power modes where and whenever it makes sense. The Genoa/Raphael IOD will be the first new IOD where such IO power saving techniques will be implemented for server/desktop packages as well. The Raphael-H rumors/leaks point to that step happening at least for the Raphael cIOD, and there is no reason it wouldn't be implemented in the Genoa sIOD as well then.
I suspect Zen 4c, and possibly other derivatives of it, are going to be around for a while, so it may have more radical changes. It might be mostly process tech optimizations, but it seems like they are also going to cut some stuff out. Giant vector FP units are almost entirely unused in a wide range of servers. There is a big difference between a regular server and an HPC machine. I have wondered if it might be radically different with some number of cores sharing L2 and possibly FP units.

As for HBM cache, I mentioned that it is not great for a cpu cache a few post ago due to the DRAM latency. The HBM is mostly for the integrated GPU. AMD supports virtual memory on their GPUs which allows the system memory to essentially act as swap space. Doing the same thing with an APU-like device with HBM swapped out to DDR5 system memory would be great. We probably will not get something like that unless it is a device made for many chiplets, like separate cpu, IO, gpu, and HBM chiplets. They could possibly put an HBM interface on an IO die with an integrated GPU, but that doesn’t seem that likely.
 

DrMrLordX

Lifer
Apr 27, 2000
20,519
9,606
136
The question is, “can you afford to burn 10W”. The answer is: yes.
Okey dokey. We'll see how that works out.

We are stuck thinking in terms of Zen3, Raphael will have a GPU on the IO die.
That's not particularly relevant to IF link power consumption.

I postulate that the power improvements ported from cezanne combined with a 6nm IO die shrink along with (lp)DDR5 support will significantly reduce power consumption. 45W Raphael will not only exist, but it will thrive.
Phoenix will probably do it all better though. Or at least more efficiently.
 
  • Like
Reactions: BorisTheBlade82

eek2121

Platinum Member
Aug 2, 2005
2,296
3,004
136
My thoughts: New floor plan obviously. New balance of existing elements (like the changes between Zen and Zen 2) maybe. New core design less likely. Denser design library rather unlikely since Zen 2 already used the library focused on density and there is no indication AMD approached Zen 3 and Zen 4 differently. What I expect to change is that the longer time is being used to simulate and optimize the existing design in a denser layout, but with all ingredients being the same (so known quantities, important to optimize the hell out of them). The existing designs use the dense library to have a fine grid pattern on which to space out the transistors. The APUs were efforts to reduce the spacing again afterward, Zen 4c should be in line with that, with a high margin market added to the effort.


As the name says HBM is high bandwidth memory. It lacks the low latency to be really useful as cache.


The current IODs essentially are always full on. In the APUs AMD tweaked the uncore to use lower power modes where and whenever it makes sense. The Genoa/Raphael IOD will be the first new IOD where such IO power saving techniques will be implemented for server/desktop packages as well. The Raphael-H rumors/leaks point to that step happening at least for the Raphael cIOD, and there is no reason it wouldn't be implemented in the Genoa sIOD as well then.
Given AMD is still prefixing the cores with “Zen4”, I am going to assume that they are simply using/taking advantage of high density libraries and possibly cutting cache. Maybe I am reading too much into it, however.

If all they have (i hope not) is already implemented in Cezanne, then it's not much to say the least (having both cezanne and vermeer, i may share my thoughts regarding fabric power efficiency, if you want). Rather, i hope, they use specifically optimized 6nm process for the IODs and of similar purpose circuitries. Besides, there are some very innovative powersaving tech's patented in a last 3 or more years, that hopefully were already applied at the design stage.
Rembrandt, Genoa, and Raphael all have some new power saving tech. AMD actually confirmed this quite recently I believe.

I suspect Zen 4c, and possibly other derivatives of it, are going to be around for a while, so it may have more radical changes. It might be mostly process tech optimizations, but it seems like they are also going to cut some stuff out. Giant vector FP units are almost entirely unused in a wide range of servers. There is a big difference between a regular server and an HPC machine. I have wondered if it might be radically different with some number of cores sharing L2 and possibly FP units.

As for HBM cache, I mentioned that it is not great for a cpu cache a few post ago due to the DRAM latency. The HBM is mostly for the integrated GPU. AMD supports virtual memory on their GPUs which allows the system memory to essentially act as swap space. Doing the same thing with an APU-like device with HBM swapped out to DDR5 system memory would be great. We probably will not get something like that unless it is a device made for many chiplets, like separate cpu, IO, gpu, and HBM chiplets. They could possibly put an HBM interface on an IO die with an integrated GPU, but that doesn’t seem that likely.
AMD, like any company in this field, likely has dozens, if not hundreds of variations in flight at any given time. This allows them to be more agile and respond to marketplace changes.
Okey dokey. We'll see how that works out.



That's not particularly relevant to IF link power consumption.



Phoenix will probably do it all better though. Or at least more efficiently.
I think that moving forward, AMD will yse monolithic dies for<40W chips and a mix of monolithic and chiplet based approaches for 45W chips.

I do not think AMD originally planned on Raphael H (though they definitely are testing it), but Intel would likely have complete performance leadership until they are able to get Phoenix out the door if they didn’t consider this route.

Raphael “H” will likely compete against the top ADL-P SKUs, with Rembrandt covering the rest. AMD is likely changing plans internally pretty quickly going forward. I do not understand why they ever thought “Zen3+” was a good idea unless used alongside Zen4. Alder Lake mobile is probably going to give them quite the headache, and I definitely wouldn’t write off Sapphire Rapids. About the only places AMD is in good shape right now is desktop and HEDT.
 

LightningZ71

Golden Member
Mar 10, 2017
1,538
1,766
136
Zen3d/3+ seems more of a hedge against limited N5(x) capacity than anything else. It allows them to produce more processors with competitive performance on a node that's no longer leading edge. AMD's biggest issue right now is, oddly, an enviable one, they can't build enough parts to keep the market satisfied. They are literally leveraging 5 different nodes right now: GF12/14 (Dali, etc), N7, N6, preproduction on N5, and, likely, GF12+ for Monet, while also looking at various Samsung nodes. Not making Zen3d would be deliberately avoiding an opportunity to produce products for profit and to support their partners in the rest of the industry by making products that create demand for theirs as well. Intel is going to be supply constrained themselves across their entire lineup with the noted yield issues on 10+, limited capacity for 10sf for TL, limited capacity for 10esf for AL, 14 running flat out to meet OEM volume, etc. It stands to reason that AMD should shovel everything that they can into the market while they can to keep revenues up, to allow more money for R&D and build better relationships with their suppliers. Also, as you gain critical mass in the market, you also coax software makers to begin to optimize more for your products, making your job easier in R&D.
 

ASK THE COMMUNITY