Speculation: Ryzen 4000 series/Zen 3

soresu · Nov 21, 2019

itsmydamnation said:
Zen3 adds AI instructions we just have to wait and see if that is matrix or just things like bfloat

It's inevitable, they are just behind the swing of the industry, at least compared to ARM who have been going that way since A75.

That could be the meat of what Forrest was referring to about new architecture - BFloat16 and all that.

soresu · Nov 21, 2019

jamescox said:
Things that can take advantage of such wide vectors would be better off on a gpu. Intel didn’t have a gpu at the time, so AVX512 was their way of trying to get a cpu to behave like a gpu.

The question is how will Intel act going forward once Xe gets into full swing?

Will AVX512 become the hard limit to CPU vectors/SIMD, or will they carry on increasing over time.....

DrMrLordX · Nov 21, 2019

inf64 said:
- clocks 5-7% higher

TSMC's literature indicates 10% higher performance for 7nm+ at isopower, so I see no reason why AMD would be restricted to less than a 10% increase in core clocks (on average). We might see more-aggressive multicore turbos instead of higher turbo limits.

- core count 25% higher (10 chiplets x 8 cores; CCX is 8 cores sharing huge L3 cache)

TSMC's literature also indicates a 20% increase in density for 7nm+ over 7nm. I don't see them necessarily increasing core count if they're going to spend more transistors increasing IPC.

This will put a stop to any gain Tigerlake/Icelake will make over SKL/X core.

Let's face reality: Intel isn't going to have any server parts other than Cooper Lake to ship en masse in 2020. IceLake-SP is late 2020 and may never ship in significant quantity. Intel's next real server upgrade is Sapphire Rapids, and that will be facing Zen4. It will also ship later than Zen4 EPYC products.

Adonisds · Nov 21, 2019

DrMrLordX said:
TSMC's literature indicates 10% higher performance for 7nm+ at isopower, so I see no reason why AMD would be restricted to less than a 10% increase in core clocks (on average). We might see more-aggressive multicore turbos instead of higher turbo limits.

TSMC's literature also indicates a 20% increase in density for 7nm+ over 7nm. I don't see them necessarily increasing core count if they're going to spend more transistors increasing IPC.

Let's face reality: Intel isn't going to have any server parts other than Cooper Lake to ship en masse in 2020. IceLake-SP is late 2020 and may never ship in significant quantity. Intel's next real server upgrade is Sapphire Rapids, and that will be facing Zen4. It will also ship later than Zen4 EPYC products.

I don't think those performance increase estimates are valid for desktop high performance CPUs. If they were, the +35% performance from 14nm to 7nm would result in a 5.4 GHz capable 3700X. I think expecting more than 5% is unreasonable

Richie Rich · Nov 21, 2019

jamescox said:
IPC varies by a large amount across applications. Some applications may get a big boost from the completely reworked cache hierarchy while others may not. A 20% boost is a bit over optimistic, I think. There could be some applications that get a big boost from certain architectural improvements though. They could expand the AVX again which would also require increasing bandwidth again. I don’t know if they will support AVX512. I have always considered widening the width to 512 to be a bit of a kludge. Things that can take advantage of such wide vectors would be better off on a gpu. Intel didn’t have a gpu at the time, so AVX512 was their way of trying to get a cpu to behave like a gpu.

They are focusing on efficiency, which actually allows for higher clocks in power constrained packages. With a new design on a new process variant, even AMD may not know what clocks they will be able to hit. I don’t really expect more cores with this generation. I think they are just going to have very large cache chips for HPC. The die would get a bit larger with 48 or 64 MB L3 and that will take up the available space on the Epyc package. The 32 MB die may not be any larger though. The current Zen 2 die is 2 core clusters with 4 cores each and 16 MB per core cluster. That is 8 cores total and 32 MB L3 total. Zen 3 will be 8 cores with 32 MB per die, so it is actually the same, just arranged differently. They will get a shrink with 7nm+, so it may be a similar size, even with new features. If the latency has gone up for L3, they may want larger L2 though. Cache shrinks well though, so I am still expecting relatively small die.

48MB L3 cache would be approx 84mm2 die on 7nm EUV. I did some simple calc few pages back. 64MB would be probably too much area increase (assuming AMD will use 64MB L3 for Zen4 with enhanced Zen3 uarch on 5nm together with core count rise). Anyway 48MB L3 cache is 3x more cache for ST load than Zen2.

Regarding AI acceleration, there is no space for AI on Zen3 chiplet. Right now it doesn't make sense either. AI is not useful in servers what is primary goal for Zen3. AMD said they will use GPU for AI.

soresu · Nov 21, 2019

DrMrLordX said:
TSMC's literature indicates 10% higher performance for 7nm+ at isopower, so I see no reason why AMD would be restricted to less than a 10% increase in core clocks (on average). We might see more-aggressive multicore turbos instead of higher turbo limits.

I always got the impression that that metric implies higher performance at iso design - ie if you simply ported Zen2 to 7nm+ with minimal changes.

If Zen3 is significantly different (which Forrest basically just expressed) then that higher performance will probably depend at least as much on the clocking efficiency of the Zen3 uArch as the 7nm+ process it is being fabbed on.

I'm still wondering just how much change we will see in the IOD - what process it is on, how much area it will take up, and perhaps even room for GFX CCD IO on the consumer variants.

soresu · Nov 21, 2019

Richie Rich said:
Anyway 48MB L3 cache is 3x more cache for ST load than Zen2.

Even 32MB would be twice as much, and a more realistic increase figure of 40MB would be 2.5x still.

That's a 25% increase in total L3 vs the idealised 20% area efficiency increase for 7nm+.

After all they don't want to spend all their transistor budget on cache alone before they even touch the core - especially if they have any desire to fit more CCD's in the Epyc package.

Topweasel · Nov 21, 2019

soresu said:
Even 32MB would be twice as much, and a more realistic increase figure of 40MB would be 2.5x still.

That's a 25% increase in total L3 vs the idealised 20% area efficiency increase for 7nm+.

After all they don't want to spend all their transistor budget on cache alone before they even touch the core - especially if they have any desire to fit more CCD's in the Epyc package.

I wouldn't even count on a change. As it is AMD is miles ahead of of cache size and a third of that in cores, and twice that last one in perf/power. Point being that and efficiency in die space increase might be a good opportunity for AMD to recoup used die space or b lower hotspotting by increasing dead die space (like 12nm). Though we don't know what AMD is going to do with the new uarch. They could use that space just to widen the core design.

Olikan · Nov 21, 2019

What if they increase L2 cache to compensate the latency increase of the L3?
It won't be first time for AMD to count one cache as the total cache

Topweasel · Nov 21, 2019

Olikan said:
What if they increase L2 cache to compensate the latency increase of the L3?
It won't be first time for AMD to count one cache as the total cache

Also possible. Just saying that L3 increase at this point probably isn't as large a concern because they have already given customers an undreamed of amount. I mean insane compared to everyone. Very few tools out there are going to be well designed to take advantage of that. Little early for them to be chasing increasing that cache count in one cycle and not letting their customers catch up. Reorganizing how the CPU utilizes it for better performance. Sure. But I am guessing there are 9 billion other things AMD could spend the space on even using the space as dead space, that will help the CPU performance more than upping L3 more then they already have. I mean we are talking about .5GB of L3 cache on a 2S system.

amrnuke · Nov 21, 2019

DrMrLordX said:
TSMC's literature indicates 10% higher performance for 7nm+ at isopower, so I see no reason why AMD would be restricted to less than a 10% increase in core clocks (on average). We might see more-aggressive multicore turbos instead of higher turbo limits.

TSMC's literature also indicates a 20% increase in density for 7nm+ over 7nm. I don't see them necessarily increasing core count if they're going to spend more transistors increasing IPC.

Let's face reality: Intel isn't going to have any server parts other than Cooper Lake to ship en masse in 2020. IceLake-SP is late 2020 and may never ship in significant quantity. Intel's next real server upgrade is Sapphire Rapids, and that will be facing Zen4. It will also ship later than Zen4 EPYC products.

Agree with this entirely.

AMD have been boosting frequency, working on latency, and to an extent increasing cores for iso price (3900X vs 1800X at $499 where iso-price gets you +4C/+8T).

I think AMD use this "tick" of Zen3 to explore cache improvements, using higher density to improve heat dissipation, boost clocks, etc.

I think they can taste the blood. If they can get a 12C or 16C to heavily (multi-core, but perhaps not all-core) boost to 5GHz, and work out little kinks here and there with latencies and cache, I think they have a very realistic chance to take the crown from whatever Intel has in the pipeline.

amd6502 · Nov 21, 2019

I really doubt 20% IPC uplift unless we re talking MT and there is some form of 4-way MT in Zen3

But low-mid teens single-thread IPC bump with freq bump might slightly exceed 20%.

The IO hub looks like it will be reused, so no 10 chiplet. There is no issue with core counts either, so that s not an area that needs improvement.

L3 is already oversized. So improvements on L3 will not be in dimension of capacity.

inf64 · Nov 21, 2019

amd6502 said:
I really doubt 20% IPC uplift unless we re talking MT and there is some form of 4-way MT in Zen3

But low-mid teens single-thread IPC bump with freq bump might slightly exceed 20%.

The IO hub looks like it will be reused, so no 10 chiplet. There is no issue with core counts either, so that s not an area that needs improvement.

L3 is already oversized. So improvements on L3 will not be in dimension of capacity.

If you read what Forrest Norrod stated in an interview you will see he is hinting at a bigger than Zen2 uplift for Zen3 core. He said that Zen2 was a bit of an outlier as the core itself was an evolution of original Zen - they managed to achieve an outstanding 15% IPC uplift with what essentially is NOT a brand new uarchitecture.

Zen3 is said to be a brand new uarachitecture (stated by Norrod) and it will bring improvements aligned with that. So 20% is perfectly reasonable to expect, it might be even on the lower end of the spectrum.

jamescox · Nov 21, 2019

soresu said:
Even 32MB would be twice as much, and a more realistic increase figure of 40MB would be 2.5x still.

That's a 25% increase in total L3 vs the idealised 20% area efficiency increase for 7nm+.

After all they don't want to spend all their transistor budget on cache alone before they even touch the core - especially if they have any desire to fit more CCD's in the Epyc package.

The 32 MB variant would be the same number of cores and cache per die as current Zen 2 die. It will just be one 8 core, 32 MB CCX instead of two 4 core, 16 MB CCX. Desktop applications probably don’t need more than 16 MB accessible from one core for most things, but it can help multi-threaded applications to have a monolithic last level cache. AMD already has the mainstream at 8 core and 32 MB total, but it is split into 2 CCX. Zen 3 will probably provide access to all 32 MB from any core on the die. A single 8 core CCX with 32 MB L3 should be incredibly powerful. Intel will have 24.75 MB with the i9-10980XE and close to 20 MB on their other cpus in the high end 10000 series line-up, so it is necessary really. The market has been stagnant for so long with just intel 4 core chips for years, that they aren’t used to the speed of improvements AMD is driving.

The place where the larger cache is really needed is HPC and certain server applications. There are still some server applications where Epyc loses to Intel parts by a significant margin. It is mostly database applications that take advantage of the monolithic 38.5 MB cache on intel high end Xeon parts. Intel also has a specialized Xeon for that market, the e5-2699v4 with a 55 MB cache. I suspect AMD will make some high cache variants for that portion of the server market and for HPC applications. Some of the the supercomputer design wins may use that part. With the 32 MB variant probably being similar in size to current Zen 2 die, increasing the cache significantly for some parts should be doable. They are making larger die on 7 nm for gpus. Even a 64 MB variant should be less than twice the size of current Zen 2 die (8 cores and 32 MB total) which is only around 75 to 80 square mm.

dnavas · Nov 21, 2019

inf64 said:
If you read what Forrest Norrod stated in an interview you will see he is hinting at a bigger than Zen2 uplift for Zen3 core.

Let me try this on the other side. As no comparison was made between Zen2 uplift and the "normal/expected" uplift of a new architecture, it's impossible to determine. He doesn't say it's less, he doesn't say it's more. He said Zen3 is producing what one would expect from a new architecture. I'm not sure what that is. *I* would think between 10-20%, but I'm not a CPU architect. I certainly would not, for example, expect Bulldozer->Zen1 increases. Am I remembering that AMD's former guideline was something like double Intel's 5%?, and each release would be a tock? It would seem far safer to assume that Zen3 is about 10% faster.

You can't use Forrest's statement either way. It's not "Well, Zen2 had a much larger than expected increase and Zen3 is hitting at about where we expected it to" to argue that Zen3 is < Zen2 and it isn't "Zen2 had a larger than expected increase for a mere evolutionary upgrade while Zen3 is hitting where we'd expect a new revolutionary architecture massive upgrade to hit." Neither one. None of the above. The tea leaf, such as it is, is "normal architectural increase." I'm not sure what normal is for architectural changes :|

inf64 · Nov 21, 2019

dnavas said:
Let me try this on the other side. As no comparison was made between Zen2 uplift and the "normal/expected" uplift of a new architecture, it's impossible to determine. He doesn't say it's less, he doesn't say it's more. He said Zen3 is producing what one would expect from a new architecture. I'm not sure what that is. *I* would think between 10-20%, but I'm not a CPU architect. I certainly would not, for example, expect Bulldozer->Zen1 increases. Am I remembering that AMD's former guideline was something like double Intel's 5%?, and each release would be a tock? It would seem far safer to assume that Zen3 is about 10% faster.

You can't use Forrest's statement either way. It's not "Well, Zen2 had a much larger than expected increase and Zen3 is hitting at about where we expected it to" to argue that Zen3 is < Zen2 and it isn't "Zen2 had a larger than expected increase for a mere evolutionary upgrade while Zen3 is hitting where we'd expect a new revolutionary architecture massive upgrade to hit." Neither one. None of the above. The tea leaf, such as it is, is "normal architectural increase." I'm not sure what normal is for architectural changes :|

To cut the story short, in the Conroe/K10 era normal IPC uplift was ~15-20% (Yonah-> Merom(Conroe), K8->K10). Similar goes for K7->K8 jump.

soresu · Nov 21, 2019

jamescox said:
They are making larger die on 7 nm for gpus. Even a 64 MB variant should be less than twice the size of current Zen 2 die (8 cores and 32 MB total) which is only around 75 to 80 square mm.

If they make the CCD's significantly bigger it would have to be for more cores, otherwise they would run into problems fitting them all together at 8+ CCD + IOD while maintaining same or greater core counts for Epyc and Threadripper.

soresu · Nov 21, 2019

dnavas said:
You can't use Forrest's statement either way. It's not "Well, Zen2 had a much larger than expected increase and Zen3 is hitting at about where we expected it to" to argue that Zen3 is < Zen2 and it isn't "Zen2 had a larger than expected increase for a mere evolutionary upgrade while Zen3 is hitting where we'd expect a new revolutionary architecture massive upgrade to hit." Neither one. None of the above. The tea leaf, such as it is, is "normal architectural increase." I'm not sure what normal is for architectural changes :|

Something about features originally planned for Zen1/Zen+ being in Zen2 is I think part of the greater than expected uplift in it - ie it was originally intended to be more incremental at the core level with the CCD/IOD change being the main focus of the Zen2 generation.

dnavas said:
It would seem far safer to assume that Zen3 is about 10% faster.

I think it may be that we get a modest IPC jump, but a nice significant jump in mhz/watt efficiency - taken together it would provide a very nice boost in perf/watt.

By which I don't necessarily mean the absolute clocking ceiling before power use skyrockets, I mean more like the sweet spot where perf is still great but you have very low power consumption.

This would be great for mobile (and potentially standalone VR) parts, not to mention a solid foundation for a switch to vertical integration as is rumoured for Zen4.

Ajay · Nov 21, 2019

inf64 said:
To cut the story short, in the Conroe/K10 era normal IPC uplift was ~15-20% (Yonah-> Merom(Conroe), K8->K10). Similar goes for K7->K8 jump.

Lots of low hanging fruit back then (gains from IMCs for example) and more significant gains being delivered by new process nodes.

inf64 · Nov 21, 2019

Ajay said:
Lots of low hanging fruit back then (gains from IMCs for example) and more significant gains being delivered by new process nodes.

Well Skylake ->Icelake is 18% IPC jump, very close to the "oldschool" uarchitectural jumps while both are essentially very similar cores. Same goes to Zen1+>Zen2 and I expect similar will go for Zen2->Zen3. There is always the low hanging fruit if *you* (the designer) know the limitations/bottlenecks in the core. I'm pretty sure both intel and AMD engineers know what are the limitations and what/how much they can do with the tools/nodes at their disposal. Getting 15-20% IPC is obviously not that hard, evidence is all around us these days.

moinmoin · Nov 21, 2019

inf64 said:
Well Skylake ->Icelake is 18% IPC jump, very close to the "oldschool" uarchitectural jumps while both are essentially very similar cores.

That wasn't a straight jump though, Cannon Lake was in between.

.vodka · Nov 21, 2019

Yup, taking Cannonlake into account brings that 18% back to the usual 5-10% Intel had us accustomed to for the past decade.

That leaked Tigerlake GB5 result that was done with the thing running at 400MHz, if scaled linearly to Icelake's 3.9GHz is an ~8% increase. Willow Cove would be another step in line with the usual.

Golden Cove could be anything. I guess it won't be anything revolutionary either. MLID's sources claim it's enough to beat Zen3.

I think that if there's ever going to be a big >15% IPC jump coming from Intel it's going to be from whatever Keller & other geniuses that were hired recently are working on for release on 7nm in ~2022 (Ocean Cove). Vulnerabilities fixed and everything.

If AMD can keep up their pace with Zen3 and Zen4 then they will not have any issues competing or keeping the crown in the future...

Unless TSMC fails somewhere and delays further nodes, or Intel gets 10nm's clock speed potential up to 14+++'s > 5GHz and Golden Cove benefits from that.

Thunder 57 · Nov 21, 2019

.vodka said:
Yup, taking Cannonlake into account brings that 18% back to the usual 5-10% Intel had us accustomed to for the past decade.

That leaked Tigerlake GB5 result that was done with the thing running at 400MHz, if scaled linearly to Icelake's 3.9GHz is an ~8% increase. Willow Cove would be another step in line with the usual.

Golden Cove could be anything. I guess it won't be anything revolutionary either. MLID's sources claim it's enough to beat Zen3.

I think that if there's ever going to be a big >15% IPC jump coming from Intel it's going to be from whatever Keller & other geniuses that were hired recently are working on for release on 7nm in ~2022 (Ocean Cove). Vulnerabilities fixed and everything.

If AMD can keep up their pace with Zen3 and Zen4 then they will not have any issues competing or keeping the crown in the future...

Unless TSMC fails somewhere and delays further nodes, or Intel gets 10nm's clock speed potential up to 14+++'s > 5GHz and Golden Cove benefits from that.

It might just be me, but I'm not going to take anyone seriously when they call x86 "times 86". If Intel starts executing well, then it may be possible.

tamz_msc · Nov 22, 2019

moinmoin said:
That wasn't a straight jump though, Cannon Lake was in between.

.vodka said:
Yup, taking Cannonlake into account brings that 18% back to the usual 5-10% Intel had us accustomed to for the past decade.

Eh, Cannon Lake in practice brought no IPC gains whatsoever, it even regressed due to worse memory latency.

itsmydamnation · Nov 22, 2019

tamz_msc said:
Eh, Cannon Lake in practice brought no IPC gains whatsoever, it even regressed due to worse memory latency.

no, the core itself added IPC.

Speculation: Ryzen 4000 series/Zen 3

Diamond Member

Diamond Member

Lifer

Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member