Speculation: Ryzen 4000 series/Zen 3

Page 50 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jamescox

Senior member
Nov 11, 2009
637
1,103
136
My expectation for Zen3:
- IPC uplift ~20% (based on Norrod's comments this will be a true generational leap as core is based on a new uarchitecture' there will be radical changes in the pipeline structure, reduced latencies for instructions, increased ALU/FPU/AGU counts, other structures increased to accommodate the changes)
- clocks 5-7% higher
- core count 25% higher (10 chiplets x 8 cores; CCX is 8 cores sharing huge L3 cache)
- power draw roughly the same or slightly higher than Rome
- *possibility* of SMT4, not excluding it yet

This will put a stop to any gain Tigerlake/Icelake will make over SKL/X core. I think that the only advantage intel will have is going to be AVX512 and that's it. Everything else will be roflstomped by Milan.

IPC varies by a large amount across applications. Some applications may get a big boost from the completely reworked cache hierarchy while others may not. A 20% boost is a bit over optimistic, I think. There could be some applications that get a big boost from certain architectural improvements though. They could expand the AVX again which would also require increasing bandwidth again. I don’t know if they will support AVX512. I have always considered widening the width to 512 to be a bit of a kludge. Things that can take advantage of such wide vectors would be better off on a gpu. Intel didn’t have a gpu at the time, so AVX512 was their way of trying to get a cpu to behave like a gpu.

They are focusing on efficiency, which actually allows for higher clocks in power constrained packages. With a new design on a new process variant, even AMD may not know what clocks they will be able to hit. I don’t really expect more cores with this generation. I think they are just going to have very large cache chips for HPC. The die would get a bit larger with 48 or 64 MB L3 and that will take up the available space on the Epyc package. The 32 MB die may not be any larger though. The current Zen 2 die is 2 core clusters with 4 cores each and 16 MB per core cluster. That is 8 cores total and 32 MB L3 total. Zen 3 will be 8 cores with 32 MB per die, so it is actually the same, just arranged differently. They will get a shrink with 7nm+, so it may be a similar size, even with new features. If the latency has gone up for L3, they may want larger L2 though. Cache shrinks well though, so I am still expecting relatively small die.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
Zen3 adds AI instructions we just have to wait and see if that is matrix or just things like bfloat
It's inevitable, they are just behind the swing of the industry, at least compared to ARM who have been going that way since A75.

That could be the meat of what Forrest was referring to about new architecture - BFloat16 and all that.
 

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
Things that can take advantage of such wide vectors would be better off on a gpu. Intel didn’t have a gpu at the time, so AVX512 was their way of trying to get a cpu to behave like a gpu.
The question is how will Intel act going forward once Xe gets into full swing?

Will AVX512 become the hard limit to CPU vectors/SIMD, or will they carry on increasing over time.....
 

DrMrLordX

Lifer
Apr 27, 2000
21,571
10,764
136
- clocks 5-7% higher

TSMC's literature indicates 10% higher performance for 7nm+ at isopower, so I see no reason why AMD would be restricted to less than a 10% increase in core clocks (on average). We might see more-aggressive multicore turbos instead of higher turbo limits.

- core count 25% higher (10 chiplets x 8 cores; CCX is 8 cores sharing huge L3 cache)

TSMC's literature also indicates a 20% increase in density for 7nm+ over 7nm. I don't see them necessarily increasing core count if they're going to spend more transistors increasing IPC.

This will put a stop to any gain Tigerlake/Icelake will make over SKL/X core.

Let's face reality: Intel isn't going to have any server parts other than Cooper Lake to ship en masse in 2020. IceLake-SP is late 2020 and may never ship in significant quantity. Intel's next real server upgrade is Sapphire Rapids, and that will be facing Zen4. It will also ship later than Zen4 EPYC products.
 

Adonisds

Member
Oct 27, 2019
98
33
51
TSMC's literature indicates 10% higher performance for 7nm+ at isopower, so I see no reason why AMD would be restricted to less than a 10% increase in core clocks (on average). We might see more-aggressive multicore turbos instead of higher turbo limits.



TSMC's literature also indicates a 20% increase in density for 7nm+ over 7nm. I don't see them necessarily increasing core count if they're going to spend more transistors increasing IPC.



Let's face reality: Intel isn't going to have any server parts other than Cooper Lake to ship en masse in 2020. IceLake-SP is late 2020 and may never ship in significant quantity. Intel's next real server upgrade is Sapphire Rapids, and that will be facing Zen4. It will also ship later than Zen4 EPYC products.
I don't think those performance increase estimates are valid for desktop high performance CPUs. If they were, the +35% performance from 14nm to 7nm would result in a 5.4 GHz capable 3700X. I think expecting more than 5% is unreasonable
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
IPC varies by a large amount across applications. Some applications may get a big boost from the completely reworked cache hierarchy while others may not. A 20% boost is a bit over optimistic, I think. There could be some applications that get a big boost from certain architectural improvements though. They could expand the AVX again which would also require increasing bandwidth again. I don’t know if they will support AVX512. I have always considered widening the width to 512 to be a bit of a kludge. Things that can take advantage of such wide vectors would be better off on a gpu. Intel didn’t have a gpu at the time, so AVX512 was their way of trying to get a cpu to behave like a gpu.

They are focusing on efficiency, which actually allows for higher clocks in power constrained packages. With a new design on a new process variant, even AMD may not know what clocks they will be able to hit. I don’t really expect more cores with this generation. I think they are just going to have very large cache chips for HPC. The die would get a bit larger with 48 or 64 MB L3 and that will take up the available space on the Epyc package. The 32 MB die may not be any larger though. The current Zen 2 die is 2 core clusters with 4 cores each and 16 MB per core cluster. That is 8 cores total and 32 MB L3 total. Zen 3 will be 8 cores with 32 MB per die, so it is actually the same, just arranged differently. They will get a shrink with 7nm+, so it may be a similar size, even with new features. If the latency has gone up for L3, they may want larger L2 though. Cache shrinks well though, so I am still expecting relatively small die.
48MB L3 cache would be approx 84mm2 die on 7nm EUV. I did some simple calc few pages back. 64MB would be probably too much area increase (assuming AMD will use 64MB L3 for Zen4 with enhanced Zen3 uarch on 5nm together with core count rise). Anyway 48MB L3 cache is 3x more cache for ST load than Zen2.

Regarding AI acceleration, there is no space for AI on Zen3 chiplet. Right now it doesn't make sense either. AI is not useful in servers what is primary goal for Zen3. AMD said they will use GPU for AI.
 

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
TSMC's literature indicates 10% higher performance for 7nm+ at isopower, so I see no reason why AMD would be restricted to less than a 10% increase in core clocks (on average). We might see more-aggressive multicore turbos instead of higher turbo limits.
I always got the impression that that metric implies higher performance at iso design - ie if you simply ported Zen2 to 7nm+ with minimal changes.

If Zen3 is significantly different (which Forrest basically just expressed) then that higher performance will probably depend at least as much on the clocking efficiency of the Zen3 uArch as the 7nm+ process it is being fabbed on.

I'm still wondering just how much change we will see in the IOD - what process it is on, how much area it will take up, and perhaps even room for GFX CCD IO on the consumer variants.
 

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
Anyway 48MB L3 cache is 3x more cache for ST load than Zen2.
Even 32MB would be twice as much, and a more realistic increase figure of 40MB would be 2.5x still.

That's a 25% increase in total L3 vs the idealised 20% area efficiency increase for 7nm+.

After all they don't want to spend all their transistor budget on cache alone before they even touch the core - especially if they have any desire to fit more CCD's in the Epyc package.
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
Even 32MB would be twice as much, and a more realistic increase figure of 40MB would be 2.5x still.

That's a 25% increase in total L3 vs the idealised 20% area efficiency increase for 7nm+.

After all they don't want to spend all their transistor budget on cache alone before they even touch the core - especially if they have any desire to fit more CCD's in the Epyc package.
I wouldn't even count on a change. As it is AMD is miles ahead of of cache size and a third of that in cores, and twice that last one in perf/power. Point being that and efficiency in die space increase might be a good opportunity for AMD to recoup used die space or b lower hotspotting by increasing dead die space (like 12nm). Though we don't know what AMD is going to do with the new uarch. They could use that space just to widen the core design.
 
  • Like
Reactions: soresu

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
What if they increase L2 cache to compensate the latency increase of the L3?
It won't be first time for AMD to count one cache as the total cache
Also possible. Just saying that L3 increase at this point probably isn't as large a concern because they have already given customers an undreamed of amount. I mean insane compared to everyone. Very few tools out there are going to be well designed to take advantage of that. Little early for them to be chasing increasing that cache count in one cycle and not letting their customers catch up. Reorganizing how the CPU utilizes it for better performance. Sure. But I am guessing there are 9 billion other things AMD could spend the space on even using the space as dead space, that will help the CPU performance more than upping L3 more then they already have. I mean we are talking about .5GB of L3 cache on a 2S system.
 
  • Like
Reactions: Olikan

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
TSMC's literature indicates 10% higher performance for 7nm+ at isopower, so I see no reason why AMD would be restricted to less than a 10% increase in core clocks (on average). We might see more-aggressive multicore turbos instead of higher turbo limits.



TSMC's literature also indicates a 20% increase in density for 7nm+ over 7nm. I don't see them necessarily increasing core count if they're going to spend more transistors increasing IPC.



Let's face reality: Intel isn't going to have any server parts other than Cooper Lake to ship en masse in 2020. IceLake-SP is late 2020 and may never ship in significant quantity. Intel's next real server upgrade is Sapphire Rapids, and that will be facing Zen4. It will also ship later than Zen4 EPYC products.
Agree with this entirely.

AMD have been boosting frequency, working on latency, and to an extent increasing cores for iso price (3900X vs 1800X at $499 where iso-price gets you +4C/+8T).

I think AMD use this "tick" of Zen3 to explore cache improvements, using higher density to improve heat dissipation, boost clocks, etc.

I think they can taste the blood. If they can get a 12C or 16C to heavily (multi-core, but perhaps not all-core) boost to 5GHz, and work out little kinks here and there with latencies and cache, I think they have a very realistic chance to take the crown from whatever Intel has in the pipeline.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
I really doubt 20% IPC uplift unless we re talking MT and there is some form of 4-way MT in Zen3

But low-mid teens single-thread IPC bump with freq bump might slightly exceed 20%.

The IO hub looks like it will be reused, so no 10 chiplet. There is no issue with core counts either, so that s not an area that needs improvement.

L3 is already oversized. So improvements on L3 will not be in dimension of capacity.
 

inf64

Diamond Member
Mar 11, 2011
3,680
3,943
136
I really doubt 20% IPC uplift unless we re talking MT and there is some form of 4-way MT in Zen3

But low-mid teens single-thread IPC bump with freq bump might slightly exceed 20%.

The IO hub looks like it will be reused, so no 10 chiplet. There is no issue with core counts either, so that s not an area that needs improvement.

L3 is already oversized. So improvements on L3 will not be in dimension of capacity.

If you read what Forrest Norrod stated in an interview you will see he is hinting at a bigger than Zen2 uplift for Zen3 core. He said that Zen2 was a bit of an outlier as the core itself was an evolution of original Zen - they managed to achieve an outstanding 15% IPC uplift with what essentially is NOT a brand new uarchitecture.

Zen3 is said to be a brand new uarachitecture (stated by Norrod) and it will bring improvements aligned with that. So 20% is perfectly reasonable to expect, it might be even on the lower end of the spectrum.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Even 32MB would be twice as much, and a more realistic increase figure of 40MB would be 2.5x still.

That's a 25% increase in total L3 vs the idealised 20% area efficiency increase for 7nm+.

After all they don't want to spend all their transistor budget on cache alone before they even touch the core - especially if they have any desire to fit more CCD's in the Epyc package.

The 32 MB variant would be the same number of cores and cache per die as current Zen 2 die. It will just be one 8 core, 32 MB CCX instead of two 4 core, 16 MB CCX. Desktop applications probably don’t need more than 16 MB accessible from one core for most things, but it can help multi-threaded applications to have a monolithic last level cache. AMD already has the mainstream at 8 core and 32 MB total, but it is split into 2 CCX. Zen 3 will probably provide access to all 32 MB from any core on the die. A single 8 core CCX with 32 MB L3 should be incredibly powerful. Intel will have 24.75 MB with the i9-10980XE and close to 20 MB on their other cpus in the high end 10000 series line-up, so it is necessary really. The market has been stagnant for so long with just intel 4 core chips for years, that they aren’t used to the speed of improvements AMD is driving.

The place where the larger cache is really needed is HPC and certain server applications. There are still some server applications where Epyc loses to Intel parts by a significant margin. It is mostly database applications that take advantage of the monolithic 38.5 MB cache on intel high end Xeon parts. Intel also has a specialized Xeon for that market, the e5-2699v4 with a 55 MB cache. I suspect AMD will make some high cache variants for that portion of the server market and for HPC applications. Some of the the supercomputer design wins may use that part. With the 32 MB variant probably being similar in size to current Zen 2 die, increasing the cache significantly for some parts should be doable. They are making larger die on 7 nm for gpus. Even a 64 MB variant should be less than twice the size of current Zen 2 die (8 cores and 32 MB total) which is only around 75 to 80 square mm.
 

dnavas

Senior member
Feb 25, 2017
355
190
116
If you read what Forrest Norrod stated in an interview you will see he is hinting at a bigger than Zen2 uplift for Zen3 core.

Let me try this on the other side. As no comparison was made between Zen2 uplift and the "normal/expected" uplift of a new architecture, it's impossible to determine. He doesn't say it's less, he doesn't say it's more. He said Zen3 is producing what one would expect from a new architecture. I'm not sure what that is. *I* would think between 10-20%, but I'm not a CPU architect. I certainly would not, for example, expect Bulldozer->Zen1 increases. Am I remembering that AMD's former guideline was something like double Intel's 5%?, and each release would be a tock? It would seem far safer to assume that Zen3 is about 10% faster.

You can't use Forrest's statement either way. It's not "Well, Zen2 had a much larger than expected increase and Zen3 is hitting at about where we expected it to" to argue that Zen3 is < Zen2 and it isn't "Zen2 had a larger than expected increase for a mere evolutionary upgrade while Zen3 is hitting where we'd expect a new revolutionary architecture massive upgrade to hit." Neither one. None of the above. The tea leaf, such as it is, is "normal architectural increase." I'm not sure what normal is for architectural changes :|
 

inf64

Diamond Member
Mar 11, 2011
3,680
3,943
136
Let me try this on the other side. As no comparison was made between Zen2 uplift and the "normal/expected" uplift of a new architecture, it's impossible to determine. He doesn't say it's less, he doesn't say it's more. He said Zen3 is producing what one would expect from a new architecture. I'm not sure what that is. *I* would think between 10-20%, but I'm not a CPU architect. I certainly would not, for example, expect Bulldozer->Zen1 increases. Am I remembering that AMD's former guideline was something like double Intel's 5%?, and each release would be a tock? It would seem far safer to assume that Zen3 is about 10% faster.

You can't use Forrest's statement either way. It's not "Well, Zen2 had a much larger than expected increase and Zen3 is hitting at about where we expected it to" to argue that Zen3 is < Zen2 and it isn't "Zen2 had a larger than expected increase for a mere evolutionary upgrade while Zen3 is hitting where we'd expect a new revolutionary architecture massive upgrade to hit." Neither one. None of the above. The tea leaf, such as it is, is "normal architectural increase." I'm not sure what normal is for architectural changes :|
To cut the story short, in the Conroe/K10 era normal IPC uplift was ~15-20% (Yonah-> Merom(Conroe), K8->K10). Similar goes for K7->K8 jump.
 

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
They are making larger die on 7 nm for gpus. Even a 64 MB variant should be less than twice the size of current Zen 2 die (8 cores and 32 MB total) which is only around 75 to 80 square mm.
If they make the CCD's significantly bigger it would have to be for more cores, otherwise they would run into problems fitting them all together at 8+ CCD + IOD while maintaining same or greater core counts for Epyc and Threadripper.
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,582
1,778
136
You can't use Forrest's statement either way. It's not "Well, Zen2 had a much larger than expected increase and Zen3 is hitting at about where we expected it to" to argue that Zen3 is < Zen2 and it isn't "Zen2 had a larger than expected increase for a mere evolutionary upgrade while Zen3 is hitting where we'd expect a new revolutionary architecture massive upgrade to hit." Neither one. None of the above. The tea leaf, such as it is, is "normal architectural increase." I'm not sure what normal is for architectural changes :|
Something about features originally planned for Zen1/Zen+ being in Zen2 is I think part of the greater than expected uplift in it - ie it was originally intended to be more incremental at the core level with the CCD/IOD change being the main focus of the Zen2 generation.
It would seem far safer to assume that Zen3 is about 10% faster.
I think it may be that we get a modest IPC jump, but a nice significant jump in mhz/watt efficiency - taken together it would provide a very nice boost in perf/watt.

By which I don't necessarily mean the absolute clocking ceiling before power use skyrockets, I mean more like the sweet spot where perf is still great but you have very low power consumption.

This would be great for mobile (and potentially standalone VR) parts, not to mention a solid foundation for a switch to vertical integration as is rumoured for Zen4.
 
Last edited:

Ajay

Lifer
Jan 8, 2001
15,332
7,792
136
To cut the story short, in the Conroe/K10 era normal IPC uplift was ~15-20% (Yonah-> Merom(Conroe), K8->K10). Similar goes for K7->K8 jump.
Lots of low hanging fruit back then (gains from IMCs for example) and more significant gains being delivered by new process nodes.
 
  • Like
Reactions: Gideon

inf64

Diamond Member
Mar 11, 2011
3,680
3,943
136
Lots of low hanging fruit back then (gains from IMCs for example) and more significant gains being delivered by new process nodes.

Well Skylake ->Icelake is 18% IPC jump, very close to the "oldschool" uarchitectural jumps while both are essentially very similar cores. Same goes to Zen1+>Zen2 and I expect similar will go for Zen2->Zen3. There is always the low hanging fruit if *you* (the designer) know the limitations/bottlenecks in the core. I'm pretty sure both intel and AMD engineers know what are the limitations and what/how much they can do with the tools/nodes at their disposal. Getting 15-20% IPC is obviously not that hard, evidence is all around us these days.
 
  • Like
Reactions: spursindonesia

.vodka

Golden Member
Dec 5, 2014
1,203
1,537
136
Yup, taking Cannonlake into account brings that 18% back to the usual 5-10% Intel had us accustomed to for the past decade.

That leaked Tigerlake GB5 result that was done with the thing running at 400MHz, if scaled linearly to Icelake's 3.9GHz is an ~8% increase. Willow Cove would be another step in line with the usual.

Golden Cove could be anything. I guess it won't be anything revolutionary either. MLID's sources claim it's enough to beat Zen3.

I think that if there's ever going to be a big >15% IPC jump coming from Intel it's going to be from whatever Keller & other geniuses that were hired recently are working on for release on 7nm in ~2022 (Ocean Cove). Vulnerabilities fixed and everything.



If AMD can keep up their pace with Zen3 and Zen4 then they will not have any issues competing or keeping the crown in the future...

Unless TSMC fails somewhere and delays further nodes, or Intel gets 10nm's clock speed potential up to 14+++'s > 5GHz and Golden Cove benefits from that.
 
Last edited:

Thunder 57

Platinum Member
Aug 19, 2007
2,640
3,697
136
Yup, taking Cannonlake into account brings that 18% back to the usual 5-10% Intel had us accustomed to for the past decade.

That leaked Tigerlake GB5 result that was done with the thing running at 400MHz, if scaled linearly to Icelake's 3.9GHz is an ~8% increase. Willow Cove would be another step in line with the usual.

Golden Cove could be anything. I guess it won't be anything revolutionary either. MLID's sources claim it's enough to beat Zen3.

I think that if there's ever going to be a big >15% IPC jump coming from Intel it's going to be from whatever Keller & other geniuses that were hired recently are working on for release on 7nm in ~2022 (Ocean Cove). Vulnerabilities fixed and everything.



If AMD can keep up their pace with Zen3 and Zen4 then they will not have any issues competing or keeping the crown in the future...

Unless TSMC fails somewhere and delays further nodes, or Intel gets 10nm's clock speed potential up to 14+++'s > 5GHz and Golden Cove benefits from that.

It might just be me, but I'm not going to take anyone seriously when they call x86 "times 86". If Intel starts executing well, then it may be possible.