Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Vattila · Oct 6, 2019

Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts!

Gideon · Mar 20, 2020

Yes I really hope AMD supports at least a subset of it in Zen3, considering even VIA managed to support it in Centaur, it shouldn't be that hard

DrMrLordX · Mar 20, 2020

moinmoin said:
What AVX-512 feature subsets should AMD support?

Probably whatever is supported by Cascade Lake, Cooper Lake, and IceLake/IceLake-SP.

moinmoin · Mar 20, 2020

DrMrLordX said:
Probably whatever is supported by Cascade Lake, Cooper Lake, and IceLake/IceLake-SP.

Did you look at the table I linked in your quote? Each of them supports a different selection of subsets, none of them supports them all (not even Tiger Lake).

DrMrLordX · Mar 20, 2020

moinmoin said:
Did you look at the table I linked in your quote? Each of them supports a different selection of subsets, none of them supports them all (not even Tiger Lake).

Yes. Cascade Lake and Cooper Lake are identical except for bfloat16. That's a reasonable target. IceLake supports a lot more except bfloat16. That's a more-ambitious target.

TigerLake is new enough that copying its entire instruction set may not be feasible.

I think the main point is to forget ER, PF, 4FMAPS, and 4VNNIW.

extide · Mar 21, 2020

soresu said:
There's plenty of it in x264, x265, SVT-AV1 and now some in dav1d (therefore by extension rav1e too).

I think some emulators have it too, and I wouldn't be surprised to find it in Intel's Embree which is used in quite a few ray tracing code projects (oddly including one of AMD's).

Not AVX512, though.

soresu · Mar 21, 2020

extide said:
Not AVX512, though.

I was talking abut AVX512. Look up those things and you'll find the code commits.

RPCS3 - Blog

RPCS3 is a multi-platform open-source Sony PlayStation 3 emulator and debugger written in C++ for Windows, Linux and BSD.

rpcs3.net

AVX512 SIMD (#316) · Issues · VideoLAN / dav1d · GitLab

See Wikipedia. The target will be Ice Lake (F, CD, VL, DQ, BW, IFMA, VBMI, VBMI2, VPOPCNTDQ, BITALG, VNNI, VPCLMULQDQ, GFNI, VAES)....

code.videolan.org

And finally, avx512 in x265! - x265

By Pradeep Ramachandran Finally, the acceleration that we’ve all been waiting for is here! We’ve been working extensively with Intel for the last few months to use Intel Advanced Vector Extensions 512 (AVX-512) to accelerate x265. After much effort, we’re delighted to share that we’ve been able...

x265.org

Accelerating x265 with Intel® Advanced Vector Extensions 512 (Intel®...

Vector units in CPUs have become the de facto standard for acceleration of media, and other kernels that exhibit parallelism according to the single instruction, multiple data (SIMD) paradigm. SIMD on Intel® architecture processors have evolved to enable 512-bit register files in Intel® Advanced...

software.intel.com

[x264-devel] x86: AVX-512 pixel_var2_8x8 and 8x16

[x264-devel] x86: AVX-512 pixel_sad_x3 and pixel_sad_x4

Probably more open source uses it than closed.

DrMrLordX · Mar 21, 2020

Targeting IceLake seems a bit foolish since the installed base is so small.

mopardude87 · Mar 21, 2020

I could barely wrap my head around the idea of how good 4000 series will be to even worry about the 5000 yet. I did select the new socket cause has it already been confirmed 4000 series is last to support x570?

extide · Mar 22, 2020

soresu said:
I was talking abut AVX512. Look up those things and you'll find the code commits.

RPCS3 - Blog

RPCS3 is a multi-platform open-source Sony PlayStation 3 emulator and debugger written in C++ for Windows, Linux and BSD.

rpcs3.net

AVX512 SIMD (#316) · Issues · VideoLAN / dav1d · GitLab

See Wikipedia. The target will be Ice Lake (F, CD, VL, DQ, BW, IFMA, VBMI, VBMI2, VPOPCNTDQ, BITALG, VNNI, VPCLMULQDQ, GFNI, VAES)....

code.videolan.org

And finally, avx512 in x265! - x265

By Pradeep Ramachandran Finally, the acceleration that we’ve all been waiting for is here! We’ve been working extensively with Intel for the last few months to use Intel Advanced Vector Extensions 512 (AVX-512) to accelerate x265. After much effort, we’re delighted to share that we’ve been able...

x265.org

Accelerating x265 with Intel® Advanced Vector Extensions 512 (Intel®...

Vector units in CPUs have become the de facto standard for acceleration of media, and other kernels that exhibit parallelism according to the single instruction, multiple data (SIMD) paradigm. SIMD on Intel® architecture processors have evolved to enable 512-bit register files in Intel® Advanced...

software.intel.com

[x264-devel] x86: AVX-512 pixel_var2_8x8 and 8x16

[x264-devel] x86: AVX-512 pixel_sad_x3 and pixel_sad_x4

Probably more open source uses it than closed.

Nice, that's happening sooner than I'd hoped.

Caveman · Mar 22, 2020

So.... When is this due out? Any estimates on performance above the Zen 3 release this fall?

Hans Gruber · Mar 22, 2020

I will give an opinion base on everything I have read in the media. Zen 3 will be at least a 20% improvement over Zen2. Zen 4 will have DDR5 and built on 5nm. It's too far out to predict. DDR5 could be a big improvement over DDR4.

soresu · Mar 22, 2020

extide said:
Nice, that's happening sooner than I'd hoped.

Later than it would have had Intel not been acting the fool and fragmenting it in the first place, and of course the throttling too probably has something to do with it.

Caveman · Mar 22, 2020

Hans Gruber said:
I will give an opinion base on everything I have read in the media. Zen 3 will be at least a 20% improvement over Zen2. Zen 4 will have DDR5 and built on 5nm. It's too far out to predict. DDR5 could be a big improvement over DDR4.

Just a WAG... Do you think another 20% over Zen 3 is possible?

uzzi38 · Mar 22, 2020

Caveman said:
Just a WAG... Do you think another 20% over Zen 3 is possible?

I look over to the side a bit and I see AMD claiming a 50% gen on gen (and within 12-18 months of one another) with their GPU stack in terms of power efficiency even without node shrinks.

I think we're looking at an AMD that's dedicated to pushing performance as much as possible. With Zen 4 comes the switch to N5. That alone should allow for more than just a 20% boost in overall performance, and as things stand, I have absolutely no reason to believe they can't do it.

Hans Gruber · Mar 22, 2020

Caveman said:
Just a WAG... Do you think another 20% over Zen 3 is possible?

AMD already said they were very happy with what Zen3 has done. Meaning customers will be happy with the results. In an earlier post (here) it was said 20-25% improvement for Zen3 over Zen 2. I think that is probably very accurate. Consider that Zen3 is a new processor vs. Zen 2 that was largely an evolution of Zen with a 12nm and 7nm die shrink. Whether the 7nm +is much better. They said AMD was hoping for more in the new 7+nm similar to the shrink from 12nm to 7nm, but the architecture improvements offset that. The boost they were hoping for in silicon was not as significant as they had hoped.

I should clarify my speculation is based on interviews that I read about from tech websites. Insider information. Sources from Taiwan are probably the most accurate.

Zen4 is too far out. But the 5nm and DDR5 should be significant.

inf64 · Mar 22, 2020

uzzi38 said:
I look over to the side a bit and I see AMD claiming a 50% gen on gen (and within 12-18 months of one another) with their GPU stack in terms of power efficiency even without node shrinks.

I think we're looking at an AMD that's dedicated to pushing performance as much as possible. With Zen 4 comes the switch to N5. That alone should allow for more than just a 20% boost in overall performance, and as things stand, I have absolutely no reason to believe they can't do it.

We talked about this before. AMD stated they had a 40% IPC design goal with Zen1 over previous core (last iteration of Bulldozer). They achieved ~52%.

I expect the same goes for Zen3 Vs Zen1 and Zen5 vs Zen3. If they were to achieve the goal for Zen3 (40% higher IPC vz SummitRidge Zen1) then Zen3 just needs 17.5% average IPC improvement to accomplish that.

For Zen5 is a bit trickier since we know nothing about improvements in Zen4, but if they were to be more in the line of Zen1->Zen2 then Zen5 would need north of 20% to achieve that goal. It's very much possible IMO. We are talking here about strict IPC gains without counting in the process node (power/clock). Clock and IPC would give the final performance of course.

uzzi38 · Mar 22, 2020

inf64 said:
We talked about this before. AMD stated they had a 40% IPC design goal with Zen1 over previous core (last iteration of Bulldozer). They achieved ~52%.

I expect the same goes for Zen3 Vs Zen1 and Zen5 vs Zen3. If they were to achieve the goal for Zen3 (40% higher IPC vz SummitRidge Zen1) then Zen3 just needs 17.5% average IPC improvement to accomplish that.

For Zen5 is a bit trickier since we know nothing about improvements in Zen4, but if they were to be more in the line of Zen1->Zen2 then Zen5 would need north of 20% to achieve that goal. It's very much possible IMO. We are talking here about strict IPC gains without counting in the process node (power/clock). Clock and IPC would give the final performance of course.

While I agree, the question was about performance, not IPC

Ajay · Mar 22, 2020

uzzi38 said:
I look over to the side a bit and I see AMD claiming a 50% gen on gen (and within 12-18 months of one another) with their GPU stack in terms of power efficiency even without node shrinks.

OT:
That's because AMD GPUs have such a terrible perf/watt compared to NV. Clearly, they have a big delta to cover just as with the construction core compared to Zen. Also, all we know is that RDNA2 is being manufacturing on '7nm', not which precise node.

Ajay · Mar 22, 2020

amrnuke said:
What's crazy is as recently as 2016 it was thought that transistors smaller than 7nm would be highly susceptible to quantum tunneling. That didn't turn out to be an issue as processes move from N7 to N7P/N7+ to 6/5 for TSMC, and even Samsung targeting 3nm GAAFET production in 2021.

The main future issue is that 1 silicon atom is 0.2nm wide. It seems the industry has said they are unsure if nodes beyond 3nm would be viable, though TSMC is researching 2nm, and Intel thinks they can do 1.4nm by 2029.

Circling back to Zen4... if on 5nm, and if it doesn't drop til 2022, it may be very interesting market-wise. If Intel can get 7nm out by then (I know, but bear with me), they *might* regain the process lead. I think this path is the only way I could see them doing so in the next 5 years. 5nm TSMC is projected to have 171.3 MTr/mm2 and 7nm Intel is projected to have 237.18 MTr/mm2.

In any case, it's remarkable to see that we are going to see roughly a doubling of # of transistors from 7nm to 5nm so quickly and if TSMC keeps up the cadence, it'll happen still at a speed nearly in accordance with Moore's observation, even though we are starting to approach a literal atomic limit.

Forgot to get back to this. I meant the 'marketing' 1nm. Quantum effects were already an issue with Intel's 22nm FinFETs, forget about dealing with actual feature sizes in the 1nm range; I just don’t see that happening in the next 20 years.

uzzi38 · Mar 22, 2020

Ajay said:
OT:
That's because AMD GPUs have such a terrible perf/watt compared to NV. Clearly, they have a big delta to cover just as with the construction core compared to Zen. Also, all we know is that RDNA2 is being manufacturing on '7nm', not which precise node.

1. Low hanging fruit is nice and all, but the same was claimed for RDNA3.

2. RDNA1 is on N7P. It doesn't matter which flavour of N7 RDNA2 is on, because best case is N7+ and N7+ is only 3% better in efficiency than N7P.

It's almost all uArch boyo.

Ajay · Mar 22, 2020

uzzi38 said:
1. Low hanging fruit is nice and all, but the same was claimed for RDNA3.

2. RDNA1 is on N7P. It doesn't matter which flavour of N7 RDNA2 is on, because best case is N7+ and N7+ is only 3% better in efficiency than N7P.

It's almost all uArch boyo.

I don't know what you are talking about. Don't call me boyo, K?

DisEnchantment · Mar 23, 2020

It seems to me that the X3D packaging that AMD is talking about could be TSMC's CoWoS with SoIC.
2.5D HBM and 3D SoC?

Also AMD registered some new patent applications for chiplet IVR.
20200066677
A data processor is implemented as an integrated circuit. The data processor includes a processor die. The processor die is connected to an integrated voltage regulator die using die-to-die bonding. The integrated voltage regulator die provides a regulated voltage to the processor die, and the processor die operates in response to the regulated voltage.

lobz · Mar 23, 2020

Ajay said:
I don't know what you are talking about. Don't call me boyo, K?

He did explain what he meant though. RDNA2 efficiency improvements are all from uarch, none from process node.

amrnuke · Mar 23, 2020

DisEnchantment said:
It seems to me that the X3D packaging that AMD is talking about could be TSMC's CoWoS with SoIC.
2.5D HBM and 3D SoC?

View attachment 18525

Also AMD registered some new patent applications for chiplet IVR.
20200066677
A data processor is implemented as an integrated circuit. The data processor includes a processor die. The processor die is connected to an integrated voltage regulator die using die-to-die bonding. The integrated voltage regulator die provides a regulated voltage to the processor die, and the processor die operates in response to the regulated voltage.

This is really cool, honestly. I'm sure there's a math equation that tells us how beneficial this could be, but by adding 3D packaging, you can nearly double the number of points a given distance away. That could be really huge.

Hans Gruber · Mar 23, 2020

There is speculation that Nvidia is going to wait for Big Navi to release so they see what they are up against. Instead of releasing the next generation Nvidia GPU's.

Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Senior member

Golden Member

Lifer

Diamond Member

Lifer

Senior member

Platinum Member

Lifer

Diamond Member

Senior member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Diamond Member

Platinum Member

Lifer

Lifer

Platinum Member

Lifer

Golden Member

Platinum Member

Golden Member

Platinum Member