Speculation: Ryzen 4000 series/Zen 3

Topweasel · Nov 22, 2019

itsmydamnation said:
no, the core itself added IPC.

Yeah the result was a wash because of poor clocking and other aspects but CannonLake brought a majority of the IPC uplift we are seeing now.

uzzi38 · Nov 22, 2019

amd6502 said:
I really doubt 20% IPC uplift unless we re talking MT and there is some form of 4-way MT in Zen3

But low-mid teens single-thread IPC bump with freq bump might slightly exceed 20%.

The IO hub looks like it will be reused, so no 10 chiplet. There is no issue with core counts either, so that s not an area that needs improvement.

L3 is already oversized. So improvements on L3 will not be in dimension of capacity.

No to everything but the very first and the last. Especially the comment on SMT4. That's cursed.

tamz_msc · Nov 22, 2019

itsmydamnation said:
no, the core itself added IPC.

Topweasel said:
Yeah the result was a wash because of poor clocking and other aspects but CannonLake brought a majority of the IPC uplift we are seeing now.

That's simply wrong.

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

DrMrLordX · Nov 22, 2019

Adonisds said:
I don't think those performance increase estimates are valid for desktop high performance CPUs. If they were, the +35% performance from 14nm to 7nm would result in a 5.4 GHz capable 3700X. I think expecting more than 5% is unreasonable

soresu said:
I always got the impression that that metric implies higher performance at iso design - ie if you simply ported Zen2 to 7nm+ with minimal changes.

Wanted to address both of these at once:

First, let me emphasize again that I do not simply assume that the max boost frequency for Zen3 will see a 10% improvement just because of the new node. If you examine 7nm's behavior in Matisse in particular, you will see that boost clocks struggle as more cores are loaded, often due to hotspots and the effects of an (admittedly often conservative) boost map. A 16c Vermeer (for example) isn't just going to hit 5170MHz boost just because of 7nm+. If it did, then great, but I do not expect such. Maybe boost in the range of 4.8 GHz or so seems more realistic. But if you look at a 3950x running Prime95 SmallFFTs or some other heavy AVX2 workload, the default behavior of the chip will probably put it at around 3700 MHz or so, which is often the case with a bone stock 3900x at the very least (mine does 3770MHz, but it has excellent cooling). If someone told me that a 16c Vermeer could do 4070 MHz (or so) in the same workload, I would probably believe it.

As for TSMC's metric, I'm pretty sure that the performance is isopower AND isodesign. So yes, this assumes an optical shrink of the old design. But! You also have to consider what are the design limitations on clockspeed. If I knew more about the fmax of Zen2, I could comment more saliently on how much of Zen2's clockspeed limitations are related to design versus process versus heat. It certainly appears as those Zen2 is more limited by concentrated heat than anything else. The process loves low temperatures, and if you can keep the die near room temps, all-core clocks of 4.6 GHz and higher are certainly possible. Hotspots make that insanely difficult. So I don't think it's entirely far-fetched that Zen3 will have an Fmax at least in the same ballpark as Zen2, especially when AMD is sidestepping to a new 7nm node that has as a major benefit the option of higher clocks at isopower. It's not like they're going to pull a Kaveri or Carrizo on us where cache design changes limit Fmax versus the previous generation (Piledriver).

In conclusion, I would project max boost clocks for Vermeer in the 4.8 GHz range, with the possibility of maybe 2-4 cores hitting those clocks at once. MT clocks in heavy workloads are the ones that are most likely to see the full 10% improvement.

amd6502 said:
I really doubt 20% IPC uplift

I'm a bit skeptical there, but I'm willing to let them take a shot at it. 15% would still be exceptional, especially when you consider that clocks under most applications will go up thanks to the new process.

Topweasel · Nov 22, 2019

tamz_msc said:
That's simply wrong.

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

Eek, You are right. I admit that cannon was so under my radar, that most of what I knew on that was second or third hand comments here and there. I could have sworn I had heard or read of an IPC uplift that couldn't keep up with the sad clocks. But yeah not a real uplift at all.

inf64 · Nov 22, 2019

Cannonlake brought zero IPC uplift (thanks tamz_msc) so my original point stands: it's possible for both intel and AMD to make significant IPC gains with "new" uarchitecture generations.

Ajay · Nov 22, 2019

TSMC's metrics are for optimum xtor specifications, probably derived from a 'test sled' (small chip with SRAM and logic - sometimes a simple ARM core). Actual results will totally depend on design implementation choices (AKA, actual on silicon circuit layout).

itsmydamnation · Nov 22, 2019

inf64 said:
Cannonlake brought zero IPC uplift (thanks tamz_msc) so my original point stands: it's possible for both intel and AMD to make significant IPC gains with "new" uarchitecture generations.

@int64 , @tamz-msc

How do you people come to this conclusion? You have a SOC that has near 100ms longer memory latency compared to skylake, over 200ms to memory is atrocious (ddr4 2400!) . 2 core Cannon lake doesn't have anything special on the cache front ( exact same config as skylake) yet clock for clock has the same performance as skylake. So unless your assertion is the performance impact of near 100ms of extra access latency is 0 then cannonlake core increased IPC and offset the loss of memory system performance.

moinmoin · Nov 22, 2019

I think it's not wrong to say that in circumstances Intel dreamed of (10nm working without a hitch) Cannon Lake would have been a normal step up IPC wise. That it turned into a minor "look 10nm does work (shows and quickly hides it again)" stop-gap with a wholly deactivated gen10 iGPU doesn't change that R&D had been invested in the Cannon Lake core and the gen10 EUs (the latter we'll never get to test at all, now everybody writes it was "skipped"...).

Richie Rich · Nov 22, 2019

uzzi38 said:
No to everything but the very first and the last. Especially the comment on SMT4. That's cursed.

Now when Norrod confirmed completely new uarch for Zen 3, the SMT4 speculation is alive again. Especially considering Zen 3 19h Family will be base uarch for at least Zen 4, probably also for Zen5. IMHO probability for Zen 3 SMT4 is 80% now.

itsmydamnation · Nov 22, 2019

It will be interesting to see how big the L1i cache will be in Zen3, AMD stated that they shrunk the instruction cache because they didn't have the space in the floor plan for both the increased uop cache and 64K L1i. If you consider Norrods comments then it makes sense that in Zen3 we could see a bigger change in floor plan, maybe see the return of 64K L1i while keeping the big uop cache?

jamescox · Nov 22, 2019

soresu said:
If they make the CCD's significantly bigger it would have to be for more cores, otherwise they would run into problems fitting them all together at 8+ CCD + IOD while maintaining same or greater core counts for Epyc and Threadripper.

If they make it bigger, why would it matter whether it was for more cores or more cache (besides the power consumption)? There is a big area in the middle on both sides of the Epyc package with nothing there. It looks like the empty area is larger than the current Zen 2 die, which means that they should be able to increase die size by at least 25% without pushing towards the outside of the package any further. They get a shrink with 7nm+ node, and cache shrinks well, so the base 32 MB die could be smaller than current Zen 2. It is the same amount of cores and cache as Zen 2, just connected differently. Interconnect doesn’t add to die size directly since it is in the upper metal layers. I think that gives them a bit of space to lengthen the die with extra cache in the middle. Increasing the number of cores in a cluster is a much bigger undertaking than just increasing the cache size, so we aren’t going to see variants with different numbers of cores on each die.

soresu · Nov 22, 2019

jamescox said:
and cache shrinks well

That's not what I have heard, rather the opposite in fact - as evidenced by the sheer amount of area a mere 32MB L3 takes up, even at 7nm.

jamescox said:
so the base 32 MB die could be smaller than current Zen 2.

I would take AMD's presentation 32+ MB point to mean that the Zen3 CCD will have more, just that they don't want to say how much yet so early on - in fact that and the unified L3 cache are about the only things I'm taking away from that presentation after Norrod's recent words about "completely new uArch".

uzzi38 · Nov 22, 2019

soresu said:
That's not what I have heard, rather the opposite in fact - as evidenced by the sheer amount of area a mere 32MB L3 takes up, even at 7nm.

SRAM scaling going from GloFo 14nm to TSMC's 7nm is stupid. It's like 2.5-3x. Look at Radeon VII vs Vega 64.

uzzi38 · Nov 22, 2019

Richie Rich said:
Now when Norrod confirmed completely new uarch for Zen 3, the SMT4 speculation is alive again. Especially considering Zen 3 19h Family will be base uarch for at least Zen 4, probably also for Zen5. IMHO probability for Zen 3 SMT4 is 80% now.

Your chances of Milan having SMT4 are just as high as they were before his statement.

Exactly 0.

Thunder 57 · Nov 22, 2019

Richie Rich said:
Now when Norrod confirmed completely new uarch for Zen 3, the SMT4 speculation is alive again. Especially considering Zen 3 19h Family will be base uarch for at least Zen 4, probably also for Zen5. IMHO probability for Zen 3 SMT4 is 80% now.

No, it really isn't.

Too bad we have to keep hearing you and others repeat this over and over until it actually comes out.

Here it is, one more time. I wish I could print out 1000 copies and tape them all over your home until you get it.

itsmydamnation · Nov 22, 2019

Thunder 57 said:
No, it really isn't. Too bad we have to keep hearing you and others repeat this over and over until it actually comes out.

at which point SMT 4 will be in Zen4 because they both have 4's in them.... 100% confirmed!!!!

soresu · Nov 22, 2019

uzzi38 said:
SRAM scaling going from GloFo 14nm to TSMC's 7nm is stupid. It's like 2.5-3x. Look at Radeon VII vs Vega 64.

Where did you get that from?

I can only find values for GF/Samsung High Performance SRAM at 14nm - there is no equivalent figure for TSMC 7nm on WikiChip:

From what I remember 14nm didnt scale well from 20nm because it was only a partial shrink or something, whereas 7nm was kind of a catch up in that respect - that may account for the greater than expect SRAM scaling.

uzzi38 · Nov 22, 2019

soresu said:
Where did you get that from?

I can only find values for GF/Samsung High Performance SRAM at 14nm - there is no equivalent figure for TSMC 7nm on WikiChip:

View attachment 13533

View attachment 13534

Ugh, just as I'm about to go sleep.

I know there's no specific numbers, there's a reason I specifically said to look at the Vega 64 and Radeon VII dies and compare the two. Also the same reason i gave such a large margin of error of 2.5x to 3x.

That should be enough to tell you if your original claim is accurate or not.

I mean, you could just take the simpler route of realising the L3 cache for Zen and Zen 2 are almost identical size (both between 16 and 17mm^2, yet the latter having 2x the SRAM), but i mean, that's more boring. Granted, I'm overestimating the L3 cache area on Zen 2 a bit because the area is more than L3 cache alone, your original statement never had any ground to it even if we consider worst case scenario for my own.

The jump from GloFo's 14nm to TSMC's 7nm is actually some of the best SRAM scaling we've had in a long while.

Sent from my SM-G960F using Tapatalk

soresu · Nov 22, 2019

uzzi38 said:
our original statement never had any ground to it even if we consider worst case scenario for my own.

It wasn't a statement so much as a repetition of something I had seen discussed before - probably on SemiAccurate forums.

Seemingly they didn't know what they were talking about given what you have said.

Still though, SRAM has an insanely big F2 cell size - it makes me wonder just how much smaller CPU's will get once they switch to a variant of MRAM, or some other more area efficient memory type for cache.

tamz_msc · Nov 22, 2019

itsmydamnation said:
@int64 , @tamz-msc

How do you people come to this conclusion? You have a SOC that has near 100ms longer memory latency compared to skylake, over 200ms to memory is atrocious (ddr4 2400!) . 2 core Cannon lake doesn't have anything special on the cache front ( exact same config as skylake) yet clock for clock has the same performance as skylake. So unless your assertion is the performance impact of near 100ms of extra access latency is 0 then cannonlake core increased IPC and offset the loss of memory system performance.

It isn't the same, there is actually a minor regression in SPECint.

.vodka · Nov 22, 2019

itsmydamnation said:
@int64 , @tamz-msc

How do you people come to this conclusion? You have a SOC that has near 100ms longer memory latency compared to skylake, over 200ms to memory is atrocious (ddr4 2400!) . 2 core Cannon lake doesn't have anything special on the cache front ( exact same config as skylake) yet clock for clock has the same performance as skylake. So unless your assertion is the performance impact of near 100ms of extra access latency is 0 then cannonlake core increased IPC and offset the loss of memory system performance.

https://www.reddit.com/r/intel/comments/9o8o55

https://www.reddit.com/r/intel/comments/9ol9is

Someone over at Reddit wrote this on Oct 16 2018, four hours apart. I commented over there, that's why I still have the links. It was a new user, no other content posted. A week or so later, he removed all of it.

There was a ~5% IPC increase as measured by this guy, and as still seen in the comments that remain.

It was a *VERY, VERY* thorough writeup, with a nice variety of valid benchmarks for arriving at that number. I'm still kicking myself for not saving it... let me check if I posted something from these posts at that time on other forums. Maybe there's something that remains.

edit: Here it is, at least some of it.

Page 75 - Discussion - Intel current and future Lakes & Rapids thread

Page 75 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

forums.anandtech.com

There's another site where I posted some of this, but it suffered a rollback that wiped all content back to ~2017. Oh well... Gotta thank past me!

edit2: something more here:

Cannon Lake shown to have an IPC boost of 2-6%

Here's a really good overview of the changes from...

old.reddit.com

and here:

https://twitter.com/geofflangdale/status/1052150147906326528

----------------------------

Anyway. Back to Zen3. I'm expecting a +10% IPC increase on average over Zen2 from what Forrest stated. Not much more, unless they've some radically crazy ideas like Bulldozer that actually result in a performance increase, lol

jamescox · Nov 22, 2019

itsmydamnation said:
It will be interesting to see how big the L1i cache will be in Zen3, AMD stated that they shrunk the instruction cache because they didn't have the space in the floor plan for both the increased uop cache and 64K L1i. If you consider Norrods comments then it makes sense that in Zen3 we could see a bigger change in floor plan, maybe see the return of 64K L1i while keeping the big uop cache?

If it is a new architecture, then a lot of stuff can change, although “new architecture” can have a lot of different meanings. There is a possibility that the 32 KB L1 is better for performance than a 64 KB L1. When you are down to nanoseconds being important, the speed difference between the different cache sizes could be significant. A smaller cache can be made faster. They have cycle accurate simulators at the register transfer level to explore different design options. The problem comes when certain design choices make one application faster and another application slower. There is always trade offs.

jamescox · Nov 22, 2019

itsmydamnation said:
@int64 , @tamz-msc

How do you people come to this conclusion? You have a SOC that has near 100ms longer memory latency compared to skylake, over 200ms to memory is atrocious (ddr4 2400!) . 2 core Cannon lake doesn't have anything special on the cache front ( exact same config as skylake) yet clock for clock has the same performance as skylake. So unless your assertion is the performance impact of near 100ms of extra access latency is 0 then cannonlake core increased IPC and offset the loss of memory system performance.

Note the “ms” is milliseconds, which you should not have as a latency figure unless you are talking about spinning rust random access latency. DRAM latencies are in nanoseconds. In the nanosecond range, even driving a signal through a wire on the chip can be significant. I believe the Pentium4 actually had two pipeline stages just to drive signals long distance across the chip. This is also why I was surprised that the 4 core cluster went away so soon. I guess they can get low enough latency to due a monolithic cache with 8 cores at 7nm+. I am not sure what the enabler is for that. Perhaps wire length is reduced significantly due to the actual area of the cache being much smaller for it’s size vs. 14 nm.

DrMrLordX · Nov 23, 2019

itsmydamnation said:
at which point SMT 4 will be in Zen4 because they both have 4's in them.... 100% confirmed!!!!

That won't fly in China, no sir!

Tetraphobia - Wikipedia

en.wikipedia.org

Speculation: Ryzen 4000 series/Zen 3

Diamond Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Golden Member

Senior member

Senior member

Lifer