Discussion Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

Page 190 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:
  • Like
Reactions: richardllewis_01

biostud

Lifer
Feb 27, 2003
18,173
4,649
136
If someone is on a budget and can't afford to upgrade, even a 5% overclock is better than no overclock. With the 5800X3D, you need to have an expensive mobo for BCLK OC. It doesn't matter now but what about 5 years from now when this CPU will start getting long in the tooth? What if the user wants to OC it for eking out just a little bit more performance while they save up for their next big upgrade?
But besides benchmarks, would anyone ever notice performance difference <10% and often <5%? Personally I think it is good that they are sold at their max speed. Not all CPUs needs stacking, but in the premium segment I see no problem.
 

MadRat

Lifer
Oct 14, 1999
11,908
228
106
If you kept your hottest section of a layer - the arwa housing CPU cores - all in a corner of a die, each layer could be rotated so CPU sections never stack above another.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
If you kept your hottest section of a layer - the arwa housing CPU cores - all in a corner of a die, each layer could be rotated so CPU sections never stack above another.
I was thinking about this last night and made a few post that probably don’t reflect my current conclusions. Moore’s law is dead had a video about next gen AMD GPUs. They describe these as having a base die with a compute die on top. Two of these together with two HBM stacks seem to comprise a single unit with around 150 W of power consumption. The base die was described as 6 nm with 5 nm graphics die on top. They showed up to 4 of these combined together for 8 compute die and 8 stacks of HBM.

I was thinking of bridge die acting as cache die, but with 512 MB of infinity cache rumored for a desktop gpu, that is a lot of die area, in fact likely larger than the graphics die. At 150 W for the whole 2 base die + 2 HBM unit, the graphics die itself must be rather low power, so perhaps not an issue to get the power up the stack. If you take that 64 MB is around 40 mm2 at 7 nm, then the base die, possibly with 256 MB each seems like it must be over 150 mm2; the cache will probably not shrink that much going from 7 nm to 6 nm. The base die would probably have EFB links to the other base die and possibly EFB links to HBM. The base die may have a couple of infinity fabric links for pci-express and other memory controllers to support DDR5. HBM interfaces take up very little space. The DDR5 and pci-express might take up a lot of space, if required. The CDNA GPUs have many IF links that are unnneded on the desktop. The DDR5 memory controller would also not be needed on purely gpu products. Perhaps that is an opportunity to have some separate IO chips in there somewhere for different products. The two base die units look like they would be connected with EFB; it is unclear how multiple units are connected though. For more flexibility of placement, they might be IFOP style infinity fabric links instead of EFB. They could use very wide links.

This could be the new AMD modular architecture. The base die would likely be the same across all products. They could stack RDNA graphics die, CDNA compute die, cpu die, FPGA die, etc. It is unclear whether this would be SoIC or some micro-bump BEOL tech. SoIC allows much greater connectivity and better thermals, but everything must be designed together and probably all made at TSMC. SoIC would allow it to act as massive L3 cache. I would lean towards it being an SoIC based stack. The HBM would be micro-bump, probably EFB connected rather than large interposers. There may be some other, perhaps Global Foundries made, silicon in there to support different IO for different products.

Bergamo might be the first cpu product to use this; it may be the default for Zen 5, although getting power up the stack may still be an issue. Zen 4c is specifically very low power cores; Genoa, without stacking are the high power devices. If this is the architecture, it seems like it would be 2 cpu die per base die given the small size, so only 4 base die to get 128 cores. That would be 1 GB of infinity cache. This seems like it would allow the possibility of HBM on a cpu product, although I don’t know how necessary it would be with such large SRAM caches. It also would allow a combined CPU / GPU product in an Epyc socket.

That is all very interesting, if true, but I don’t know if they would use it for desktop parts other than a possible Threadripper replacement. The minimum configuration would be something like 16 cores with 256 MB cache. If would be great if they introduce an in between socket for workstation and Threadripper that is half of Epyc. Given the modularity this would have, it doesn’t seem like it would be difficult. These would all be rather high end products. I suspect most of the mainstream market will be APUs. They can fit a lot on a single die at 5 nm. I hope they make a product with many channels of DDR5 mounted on the package or just an APU with an HBM stack for high end mobile devices.

And I wrote a too long post again.
 

MadRat

Lifer
Oct 14, 1999
11,908
228
106
Maybe AMD could develop a new bus rate for PCI-e that would nudge past the 100MHz. This should dramatically raise bandwidth with only a 33MHz bump. Perhaps a signal could be used to set it in increments increases of 4-16.6MHz, to offer future proofing and backwards compatibility at the same time. It's not like they don't hold a large share of the market that relies on the PCI-e standard.

The point of this? To keep memory as close to a match to the clocks on the PCIe to suppress latency. The HBM as a cache might act as a buffer to some of the latency expected with AMD's memory controller coping with early DDR5.
 
Last edited:

eek2121

Platinum Member
Aug 2, 2005
2,883
3,860
136
I am surprised AMD and Intel gave shown no public desire to compete with Apple’s memory solution.

Hopefully future chips scale beyond dual channel. Quad, hex, or octa channel memory would be a great way to improve performance, especially on chips with an integrated GPU.
 

moinmoin

Diamond Member
Jun 1, 2017
4,926
7,609
136
I am surprised AMD and Intel gave shown no public desire to compete with Apple’s memory solution.
Different markets. Apple sells directly to customers. AMD and Intel majorly sell to OEMs which aren't really interested in premium tech, they are interested in cheap tech that they can sell at a premium.
See all the low tech dGPUs that barely beat the iGPU already included. OEMs want customers to think that's worth a premium.
 
Jul 27, 2020
15,446
9,571
106
Apple's memory solution is kinda useless at the moment. No one is able to saturate the bandwidth. Maybe future applications can but by then, Apple will have something even better and the M1 family may just get neglected and ignored by developers as far as realizing the full potential of the hardware is concerned.
 

Doug S

Platinum Member
Feb 8, 2020
2,191
3,380
136
Apple provided that crazy level of bandwidth for the GPU, not the CPU. Intel and AMD are uninterested in building integrated GPUs that scale to high levels of performance, because that's always been the domain of discrete GPUs - which both AMD and now Intel also sell.

Those discrete GPUs AMD and Intel sell include GDDR or HBM - both of which cost more per GB than Apple's LPDDR, but need less of it since it is exclusive to the GPU.
 

lightmanek

Senior member
Feb 19, 2017
387
754
136
I've seen a lot of cards, just never an Intel one. Figured they were part of the CPU on laptops because never really looked.

I had one briefly, it was a bit of a mess with compatibility and drivers, but it was cheap.
Performance lower than nVidia of that era too.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
Anand reviewed it back in 1998!
Any guesses what screen resolution was most common back then for those photos in there not to be this incredible tiny? :oops:
I had a 1600x1200 capable crt back in 98; 19 inch Viewsonic PS790. Strangely, an Amazon listing still seems to exists. I remember one of my friends saying how big and beautiful it looked for UltimaOnline at some point. Everything is relative. I don’t remember if they had an 800x600 at the time or maybe a 1024x768; that was the common max res, I think. At the moment, I would want 4K or even 5K in a qd-oled, if possible. The only qd-oled available so far is 3440x1440.
 

MadRat

Lifer
Oct 14, 1999
11,908
228
106
Memories indeed. 1st build with a motherboard having an i740 + a voodoo 3 card for dual monitor use.
I had the 3DfX 1000, 2000, and 3500 in machines at the same time. (Must of been 2000 or 2001 iirc.) The first two were AGP and the last was PCI. So my SAMBA box had the most powerful graphics at the time. Someone gave me a great deal off the forums here for the 3500 because it was quickly outpaced after its introduction. Funny thing was Linux support was not very good for it, so I never played with its huge video I/O dongle.