Speculation: Ryzen 4000 series/Zen 3

Page 51 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,575
146
I really doubt 20% IPC uplift unless we re talking MT and there is some form of 4-way MT in Zen3

But low-mid teens single-thread IPC bump with freq bump might slightly exceed 20%.

The IO hub looks like it will be reused, so no 10 chiplet. There is no issue with core counts either, so that s not an area that needs improvement.

L3 is already oversized. So improvements on L3 will not be in dimension of capacity.

No to everything but the very first and the last. Especially the comment on SMT4. That's cursed.
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
I don't think those performance increase estimates are valid for desktop high performance CPUs. If they were, the +35% performance from 14nm to 7nm would result in a 5.4 GHz capable 3700X. I think expecting more than 5% is unreasonable

I always got the impression that that metric implies higher performance at iso design - ie if you simply ported Zen2 to 7nm+ with minimal changes.

Wanted to address both of these at once:

First, let me emphasize again that I do not simply assume that the max boost frequency for Zen3 will see a 10% improvement just because of the new node. If you examine 7nm's behavior in Matisse in particular, you will see that boost clocks struggle as more cores are loaded, often due to hotspots and the effects of an (admittedly often conservative) boost map. A 16c Vermeer (for example) isn't just going to hit 5170MHz boost just because of 7nm+. If it did, then great, but I do not expect such. Maybe boost in the range of 4.8 GHz or so seems more realistic. But if you look at a 3950x running Prime95 SmallFFTs or some other heavy AVX2 workload, the default behavior of the chip will probably put it at around 3700 MHz or so, which is often the case with a bone stock 3900x at the very least (mine does 3770MHz, but it has excellent cooling). If someone told me that a 16c Vermeer could do 4070 MHz (or so) in the same workload, I would probably believe it.

As for TSMC's metric, I'm pretty sure that the performance is isopower AND isodesign. So yes, this assumes an optical shrink of the old design. But! You also have to consider what are the design limitations on clockspeed. If I knew more about the fmax of Zen2, I could comment more saliently on how much of Zen2's clockspeed limitations are related to design versus process versus heat. It certainly appears as those Zen2 is more limited by concentrated heat than anything else. The process loves low temperatures, and if you can keep the die near room temps, all-core clocks of 4.6 GHz and higher are certainly possible. Hotspots make that insanely difficult. So I don't think it's entirely far-fetched that Zen3 will have an Fmax at least in the same ballpark as Zen2, especially when AMD is sidestepping to a new 7nm node that has as a major benefit the option of higher clocks at isopower. It's not like they're going to pull a Kaveri or Carrizo on us where cache design changes limit Fmax versus the previous generation (Piledriver).

In conclusion, I would project max boost clocks for Vermeer in the 4.8 GHz range, with the possibility of maybe 2-4 cores hitting those clocks at once. MT clocks in heavy workloads are the ones that are most likely to see the full 10% improvement.

I really doubt 20% IPC uplift

I'm a bit skeptical there, but I'm willing to let them take a shot at it. 15% would still be exceptional, especially when you consider that clocks under most applications will go up thanks to the new process.
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136

inf64

Diamond Member
Mar 11, 2011
3,685
3,957
136
Cannonlake brought zero IPC uplift (thanks tamz_msc) so my original point stands: it's possible for both intel and AMD to make significant IPC gains with "new" uarchitecture generations.
 

Ajay

Lifer
Jan 8, 2001
15,332
7,792
136
TSMC's metrics are for optimum xtor specifications, probably derived from a 'test sled' (small chip with SRAM and logic - sometimes a simple ARM core). Actual results will totally depend on design implementation choices (AKA, actual on silicon circuit layout).
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,743
3,075
136
Cannonlake brought zero IPC uplift (thanks tamz_msc) so my original point stands: it's possible for both intel and AMD to make significant IPC gains with "new" uarchitecture generations.
@int64 , @tamz-msc

How do you people come to this conclusion? You have a SOC that has near 100ms longer memory latency compared to skylake, over 200ms to memory is atrocious (ddr4 2400!) . 2 core Cannon lake doesn't have anything special on the cache front ( exact same config as skylake) yet clock for clock has the same performance as skylake. So unless your assertion is the performance impact of near 100ms of extra access latency is 0 then cannonlake core increased IPC and offset the loss of memory system performance.
 
  • Like
Reactions: .vodka

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
I think it's not wrong to say that in circumstances Intel dreamed of (10nm working without a hitch) Cannon Lake would have been a normal step up IPC wise. That it turned into a minor "look 10nm does work (shows and quickly hides it again)" stop-gap with a wholly deactivated gen10 iGPU doesn't change that R&D had been invested in the Cannon Lake core and the gen10 EUs (the latter we'll never get to test at all, now everybody writes it was "skipped"...).
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
No to everything but the very first and the last. Especially the comment on SMT4. That's cursed.
Now when Norrod confirmed completely new uarch for Zen 3, the SMT4 speculation is alive again. Especially considering Zen 3 19h Family will be base uarch for at least Zen 4, probably also for Zen5. IMHO probability for Zen 3 SMT4 is 80% now.
 
  • Haha
Reactions: CHADBOGA

itsmydamnation

Platinum Member
Feb 6, 2011
2,743
3,075
136
It will be interesting to see how big the L1i cache will be in Zen3, AMD stated that they shrunk the instruction cache because they didn't have the space in the floor plan for both the increased uop cache and 64K L1i. If you consider Norrods comments then it makes sense that in Zen3 we could see a bigger change in floor plan, maybe see the return of 64K L1i while keeping the big uop cache?
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
If they make the CCD's significantly bigger it would have to be for more cores, otherwise they would run into problems fitting them all together at 8+ CCD + IOD while maintaining same or greater core counts for Epyc and Threadripper.

If they make it bigger, why would it matter whether it was for more cores or more cache (besides the power consumption)? There is a big area in the middle on both sides of the Epyc package with nothing there. It looks like the empty area is larger than the current Zen 2 die, which means that they should be able to increase die size by at least 25% without pushing towards the outside of the package any further. They get a shrink with 7nm+ node, and cache shrinks well, so the base 32 MB die could be smaller than current Zen 2. It is the same amount of cores and cache as Zen 2, just connected differently. Interconnect doesn’t add to die size directly since it is in the upper metal layers. I think that gives them a bit of space to lengthen the die with extra cache in the middle. Increasing the number of cores in a cluster is a much bigger undertaking than just increasing the cache size, so we aren’t going to see variants with different numbers of cores on each die.
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
and cache shrinks well
That's not what I have heard, rather the opposite in fact - as evidenced by the sheer amount of area a mere 32MB L3 takes up, even at 7nm.
so the base 32 MB die could be smaller than current Zen 2.
I would take AMD's presentation 32+ MB point to mean that the Zen3 CCD will have more, just that they don't want to say how much yet so early on - in fact that and the unified L3 cache are about the only things I'm taking away from that presentation after Norrod's recent words about "completely new uArch".
 

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,575
146
That's not what I have heard, rather the opposite in fact - as evidenced by the sheer amount of area a mere 32MB L3 takes up, even at 7nm.

SRAM scaling going from GloFo 14nm to TSMC's 7nm is stupid. It's like 2.5-3x. Look at Radeon VII vs Vega 64.
 

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,575
146
Now when Norrod confirmed completely new uarch for Zen 3, the SMT4 speculation is alive again. Especially considering Zen 3 19h Family will be base uarch for at least Zen 4, probably also for Zen5. IMHO probability for Zen 3 SMT4 is 80% now.

Your chances of Milan having SMT4 are just as high as they were before his statement.

Exactly 0.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,647
3,706
136
Now when Norrod confirmed completely new uarch for Zen 3, the SMT4 speculation is alive again. Especially considering Zen 3 19h Family will be base uarch for at least Zen 4, probably also for Zen5. IMHO probability for Zen 3 SMT4 is 80% now.

No, it really isn't. :rolleyes: Too bad we have to keep hearing you and others repeat this over and over until it actually comes out.

Here it is, one more time. I wish I could print out 1000 copies and tape them all over your home until you get it.

amd3.jpg
 
Last edited:

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
SRAM scaling going from GloFo 14nm to TSMC's 7nm is stupid. It's like 2.5-3x. Look at Radeon VII vs Vega 64.
Where did you get that from?

I can only find values for GF/Samsung High Performance SRAM at 14nm - there is no equivalent figure for TSMC 7nm on WikiChip:

1574467940085.png

1574467986167.png

From what I remember 14nm didnt scale well from 20nm because it was only a partial shrink or something, whereas 7nm was kind of a catch up in that respect - that may account for the greater than expect SRAM scaling.
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,575
146
Where did you get that from?

I can only find values for GF/Samsung High Performance SRAM at 14nm - there is no equivalent figure for TSMC 7nm on WikiChip:

View attachment 13533

View attachment 13534
Ugh, just as I'm about to go sleep.

I know there's no specific numbers, there's a reason I specifically said to look at the Vega 64 and Radeon VII dies and compare the two. Also the same reason i gave such a large margin of error of 2.5x to 3x.

That should be enough to tell you if your original claim is accurate or not.

I mean, you could just take the simpler route of realising the L3 cache for Zen and Zen 2 are almost identical size (both between 16 and 17mm^2, yet the latter having 2x the SRAM), but i mean, that's more boring. Granted, I'm overestimating the L3 cache area on Zen 2 a bit because the area is more than L3 cache alone, your original statement never had any ground to it even if we consider worst case scenario for my own.

The jump from GloFo's 14nm to TSMC's 7nm is actually some of the best SRAM scaling we've had in a long while.

Sent from my SM-G960F using Tapatalk
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
our original statement never had any ground to it even if we consider worst case scenario for my own.
It wasn't a statement so much as a repetition of something I had seen discussed before - probably on SemiAccurate forums.

Seemingly they didn't know what they were talking about given what you have said.

Still though, SRAM has an insanely big F2 cell size - it makes me wonder just how much smaller CPU's will get once they switch to a variant of MRAM, or some other more area efficient memory type for cache.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,710
3,554
136
@int64 , @tamz-msc

How do you people come to this conclusion? You have a SOC that has near 100ms longer memory latency compared to skylake, over 200ms to memory is atrocious (ddr4 2400!) . 2 core Cannon lake doesn't have anything special on the cache front ( exact same config as skylake) yet clock for clock has the same performance as skylake. So unless your assertion is the performance impact of near 100ms of extra access latency is 0 then cannonlake core increased IPC and offset the loss of memory system performance.
It isn't the same, there is actually a minor regression in SPECint.
 

.vodka

Golden Member
Dec 5, 2014
1,203
1,537
136
@int64 , @tamz-msc

How do you people come to this conclusion? You have a SOC that has near 100ms longer memory latency compared to skylake, over 200ms to memory is atrocious (ddr4 2400!) . 2 core Cannon lake doesn't have anything special on the cache front ( exact same config as skylake) yet clock for clock has the same performance as skylake. So unless your assertion is the performance impact of near 100ms of extra access latency is 0 then cannonlake core increased IPC and offset the loss of memory system performance.


Someone over at Reddit wrote this on Oct 16 2018, four hours apart. I commented over there, that's why I still have the links. It was a new user, no other content posted. A week or so later, he removed all of it.

There was a ~5% IPC increase as measured by this guy, and as still seen in the comments that remain.

It was a *VERY, VERY* thorough writeup, with a nice variety of valid benchmarks for arriving at that number. I'm still kicking myself for not saving it... let me check if I posted something from these posts at that time on other forums. Maybe there's something that remains.

edit: Here it is, at least some of it.


There's another site where I posted some of this, but it suffered a rollback that wiped all content back to ~2017. Oh well... Gotta thank past me!

edit2: something more here:


and here:


----------------------------


Anyway. Back to Zen3. I'm expecting a +10% IPC increase on average over Zen2 from what Forrest stated. Not much more, unless they've some radically crazy ideas like Bulldozer that actually result in a performance increase, lol
 
Last edited:

jamescox

Senior member
Nov 11, 2009
637
1,103
136
It will be interesting to see how big the L1i cache will be in Zen3, AMD stated that they shrunk the instruction cache because they didn't have the space in the floor plan for both the increased uop cache and 64K L1i. If you consider Norrods comments then it makes sense that in Zen3 we could see a bigger change in floor plan, maybe see the return of 64K L1i while keeping the big uop cache?

If it is a new architecture, then a lot of stuff can change, although “new architecture” can have a lot of different meanings. There is a possibility that the 32 KB L1 is better for performance than a 64 KB L1. When you are down to nanoseconds being important, the speed difference between the different cache sizes could be significant. A smaller cache can be made faster. They have cycle accurate simulators at the register transfer level to explore different design options. The problem comes when certain design choices make one application faster and another application slower. There is always trade offs.
 
  • Like
Reactions: Richie Rich