• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 6000)

What do you expect with Zen 4?


  • Total voters
    246

Vattila

Senior member
Oct 22, 2004
604
710
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,264
802
136
That is the embedded roadmap though.

EPYC:

Embedded EPYC:

It isn't saying Milan should have been 2020Q3-2021Q2;

It is saying Embedded Milan should have been 2020Q3-2021Q2;

Also, effectively the google date the Embedded website is Aug 4, 2020
Was only listed on: Aug 5, 2020.

Meaning that the Embedded Epyc 7001/7002 only launched then in August 2020.

Jan 19, 2021 upload date
Googling it with "V3000"/"Zen4"/"7004" points it to having the slide.

Identifying the asterisk:
*AMD roadmaps are subject to change without notice or obligations to notify of changes.
Placement of boxes is not intended to represent first year of product shipment.
 
Last edited:

coercitiv

Diamond Member
Jan 24, 2014
4,378
5,746
136
if Nosta is to be believed at all
Name one process & architecture combo "leak" that Nosta talked about in the past and turned out to be true. The amount of fantasy nodes and architectures is staggering, and yet people still eat this crap with a spoon.

You would have better chances at predicting the future of AMD products by tossing a coin. My cat would have better chances of predicting AMD product & node mix, and I don't have a cat. It's still better than what Nosta predicts because me getting a cat and using it to make predictions is still within the realm of possibility in this universe.
 

Hans de Vries

Senior member
May 2, 2008
259
580
136
www.chip-architect.com
What's most surprising to me is the I/O die. I'm really not at all surprised by the CCDs if I'm honest, but the I/O die - new nodes have little to know effect on analog circuit density, and Genoa has a very significant increase to I/O (12ch DDR5, 128 PCIe5 lanes etc etc), yet despite that, the I/O die is actually smaller than Rome's.
The physical I/O of 7nm Cezanne is quite small. The 128 bit bus is just 5% of the 180mm2 die (The top-right rectangle) or 9 mm2.

Cezanne_die.jpg

This makes me think that AM5 may jump over Alderlake's 1700 pin package to 2000+ pins (from 1331 for AM4)
It would make desktop motherboards with 4 memory slots, each with its own channel.

AM5 needs to support 3nm CPU's and APU's. Two channel LPDDR4-4266 is already exhausted by 8 VEGA compute units. (Rembrandt has 12 Navi2 compute units on 6nm). The 5nm Rafael has an unknown amount of Navi3 compute units on AM5 and they could be clocked north of 3GHz.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,264
802
136
raphael.png

Feb 2021+ product bring up.

I can only find GPUs and one other IP bring ups so far with CPUs being less exact:
Vega10 bring up: Jan 2017-April 2017 => August 2017 launch
MI50/MI60 bring up: Jan 2018(start month) => November 2018 launch
Navi10 bring up: July 2018+ => July 2019 launch
PCIe 4.0 bring up in client/server: September 2018+ => July/August 2019 launch
Fiji bring up: Sept 2014 - November 2014 => June 2015 launch
Ontario bring up: 2010(no exact month) => January 2011 launch
Mullins bring up: 2013(no exact month) => April 2014 launch
Kaveri bring up: August 2013 => January~June 2014 launch
Radeon Pro 560x bring up: 2017(no exact month) => July 2018 launch
Radeon Pro Vega bring up: 2018(no exact month) => November 2018 launch
MI100 bring up: July 2019+(no exact start) => November 2020 launch
Zen SoC/Server (Zeppelin) bring up: Before August 2016, After November 2015 => March 2017 launch
MI200 bring up award: December 2020 => not yet launched.
3 launched within the year of first bring-up mention.
9 launched the year after first bring-up mention. Of those, majority of the mentions are during the later half/second half of the year.

On DDR-side LPDDR5/DDR5 has two spots of bring up: January 2020+ and July 2020+.

So, Raphael launching earlier than expected is more likely. ¯\_(ツ)_/¯
 
Last edited:

dnavas

Senior member
Feb 25, 2017
308
156
116
An 18 months release cycle would put Zen 4 to Q2 2022. However, Zen 4 brings a brand new platform with DDR5/PCIe5. So delays are to be expected.
I'm personally finding it increasingly difficult to justify investing in a pcie4 TR platform purchase, so I hate what this does for Zen4 TR, but it frankly doesn't make sense to ship a DDR5 platform prior to DDR5 being available in quantity. So yes, I'd expect later than sooner.

Unless [I add, somewhat self-servingly] you release on the platform that's already more expensive and might be more willing to absorb the cost. I don't expect AMD is doing this, but it seems like it would make some sense to put the TR chiplets after desktop, so that desktop can bake the design, but put the TR IO on the bleeding edge. You allow TR to benefit from the choice of the best chiplets, and allow the desktop to benefit from the better/cheaper supply chain of parts.
 

DisEnchantment

Senior member
Mar 3, 2017
818
2,031
136
I'm pretty sure since Zen2/3, AMD has been using the most dense libs available.

Ex:
View attachment 44460

The actual design reasons for AMD going for Hi-Freq, rather than going full on Hi-Inst will probably remain unknown. For all we know it could be marketing-sided rather than engineer-sided. As it is psychologically easier to sell a chip that has increased frequency over the last generation.

Above, extended w/ actual high-end on AMx:
Ryzen 7 1800X = 4.1 GHz boost
Ryzen 7 2700X = 4.3 GHz boost
Ryzen 7 3950X = 4.7 GHz boost
Ryzen 7 5950X = 4.9 GHz boost
Zen3 is using N7 HD indeed. While RDNA is using N7 HP.
For N5 the most optimal range for Zen 4 would be around 3.8-4.2 GHz according to the Shmoo plot, after that would need considerable jump in voltage for getting the frequency to the same levels like 5950X for example.
One of the advantages of designs which top out at ~3.2 GHz is that they are well below this value resulting in big gains in efficiency.

1621235308312.png

Why doesn't AMD use the high density process? Wouldn't the much higher IPC made possible by many more transistors make up for the lost frequency? Plus, it would be much more energy efficient
My thinking is that when they originally designed Zen1-Zen3 using the CCX concept they intended it to be very small and easily manufacturable. It was supposed to be cheap to produce. The high clocks could help get more performance.
Zen3 Core and upto L2 is quite small in comparison to most contemporary designs. Zen3 core (without L3) is less than half the MTr of M1
However when Zen2 and later Zen3 landed they need to tack on the big L3 to handle the weakness of the memory hierarchy with the multiple CCXs
In the end Zen3 got big anyway. Also making the MTr or the active core too high while operating in high frequency would have increased the TDP by a bit

For Zen4 with this learning it should be interesting if AMD would raise the transistor count drastically or again stick with smaller core.
 

DisEnchantment

Senior member
Mar 3, 2017
818
2,031
136
PCIe 5 is pretty much irrelevant for consumers right now and will remain that way for many years. All it does is increase motherboard costs. The only thing it possibly could affect is storage, because GPUs aren't going to come close to saturating a PCI4x16 link any time soon.
PCIe 5.0 is used as the physical layer for CXL and is supposed to bring in a new era of cache coherent accelerators.

I hope this is not the case.
Otherwise it means buying some Zen4 based TR Pro to try out some CXL based accelerators.
Only option on AM5 would be to go A+A, which is kind of against AMD open and inclusive philosophy. Issue with this is that AMD only got GPUs at the moment, so if you wanna try out FPGA based CXL capable accelerators like the ones from Xilinx you would be out of luck on AM5
It makes no sense to not have it, if they already have the PCIe 5.0 IP on Genoa. They could just have the support in the IOD and the chipset and let the Board OEMs take the cost on the high end boards.
Either that or this generation has no CXL support which is going to be an issue with developers wanting to try out CXL based accelerators.

Not happy with this.
 
  • Like
Reactions: Tlh97 and Kepler_L2

Gideon

Golden Member
Nov 27, 2007
1,357
2,594
136
I don't know, and to be frank, I don't entirely care. He's said enough bollocks for me to know that he's more than happy to either make stuff up or trust things from absolutely anyone.
100% this. Usually just pure informed speculation leads to much more accurate facts that what these leakers claim.

So many of these leaks go blatantly against common industry facts that it hurts (and this goes against all of MLID, Adored and Coreteks), Things like:
  • Claiming not having working silicon in the labs less than year before release
  • Claiming things that would require changes to silicon (other than respins) less than a year before release.
  • Claiming something has been designed but might not be released - This happens, but very rarely, as the R&D money has already been spent, it would literally have to be unsellable to get canned. (Things such as designing a 24 core Genoa and not announcing it while releasing a 16 core one makes 0 sense)
  • And the big one: Knowing SKUs and pricing 6+ months pre-release (when these are the last things that get decided. Especially the pricing as it's the only thing that can be changed easily, even hours before release)

But what really grinds my gears is if they get something wrong, they almost never admit that it was (someone's) poor speculation. Near always there is the excuse of "oh it must have been canned/postponed/changed last minute".

I still remember Adored being hell-bent that Navi will release in January Q1 2019, up to late December 2018. And when it didn't happen it was just casually "postponed due to yields". It ended up "being postponed" for 7 months. I'm sure AMD had no idea of the state their yields a month before release :p
 
Last edited:

DisEnchantment

Senior member
Mar 3, 2017
818
2,031
136
AMD HSA is here


AMD is building a system architecture for the Frontier supercomputer with
a coherent interconnect between CPUs and GPUs. This hardware architecture
allows the CPUs to coherently access GPU device memory. We have hardware
in our labs and we are working with our partner HPE on the BIOS, firmware
and software for delivery to the DOE.
The system BIOS advertises the GPU device memory (aka VRAM) as SPM
(special purpose memory) in the UEFI system address map. The amdgpu driver
looks it up with lookup_resource and registers it with devmap as
MEMORY_DEVICE_GENERIC using devm_memremap_pages.
Now we're trying to migrate data to and from that memory using the
migrate_vma_* helpers so we can support page-based migration in our
unified memory allocations, while also supporting CPU access to those
pages.
 

Gideon

Golden Member
Nov 27, 2007
1,357
2,594
136
Having something like 250 GB/s of IO bandwidth with 128 pci-express 4.0 links seems like it would have been the deciding factor.
That wasn't really true for this case.

Aurora was supposed to be ready earlier, was won by Intel and is being built with Sapphire Rapid chiplets that have PCIe 5.0, "Rambo cache" chiplets, HMB2 on package if needed (and it looks like similar unified-memory-space software). The problem is it's using micro-bumps for stacking (well it's also very late, but that wasn't certain when Frontier was announced). So if anything Intel had the I/O advantage.

There had to be some secret sauce in AMDs offerings to win Frontier like they did. This is certainly one key differentiator. Bear in mind the V-cache solution actually most likely has two layers (as it sits on top of 32MB L3 and is exactly as big on the same process). There is nothing stopping AMD from adding more layers for some server CPUs and I'm convinced now CDNA2 has this stacking as well.

And while all of this is only possible because of AMD's engineering prowess, keep in mind that this is also TSMC's win as much as it's AMDs. They're the only foundry that has anything like that ready in this time-frame. The hoops TSMC had to go through to make this work (and be producible at scale) are also enormous.

All in all ever since Zen 2 it looks like it's the trifecta of execution (Synopsis + AMD + TSMC) that is to be congratulated. AMD couldn't just do it alone.
 
Last edited:

Doug S

Senior member
Feb 8, 2020
604
813
96
Chatting on an internet forum doesn't need most of the instruction sets modern day CPUs provide. Why power all that silicon? Playing a game requires a number of instruction sets that aren't normally used. During that time, the small cores can be put to sleep, giving the big cores more headroom (by way of TDP) to run.

That's completely wrong. You think posting to Anandtech doesn't use SIMD instructions? Check out whatever is responsible in your OS kernel for zeroing pages when a new page is needed, it probably uses AVX2 in some circumstances - and that's the tip of the iceberg. You think floating point isn't needed? Sorry, all math in Javascript is done in floating point, there's no way to avoid it if you are running a browser.

I doubt there's anything you can do with a modern PC or smartphone that would allow any worthwhile reduction of instruction set coverage. Not even running an "idle loop" (which is a halt instruction these days) because there are always background/housekeeping processes running at times so the scheduler, I/O dispatch, filesystem, and other parts of the kernel will remain active.

I don't think you can usefully cut out any instructions from a small core other than 1) AVX512 (and that's only true on x86 because Intel didn't provide for variable SIMD width capability like SVE2) and 2) virtualization. Anything else you cut out will mean almost every thread will be forced onto big cores before long.
 

Doug S

Senior member
Feb 8, 2020
604
813
96
…and if I am running with javascript disabled? what if i am writing code in vim? What if the machine is a simple file sharing machine? There are plenty of opportunities to use a small core over a big one. Even something as basic as tracking a mouse pointer doesn’t need to use a big core.
OK sure if you are one of the niche cases of people who disable Javascript or run CLI stuff in console mode, fine I'll grant you that. The overwhelming majority of PC/smartphone users don't do stuff like that.

Tracking a mouse pointer doesn't need the performance of a big core, but it will almost certainly exercise your whole instruction set. Do you have any idea of the size of the hot code footprint tracking a mouse pointer on an otherwise idle system these days? A modern GUI is multiple layers of libraries.

A typical person who will leave Javascript enabled will exercise floating point if that mouse cursor moves in any browser window. When the pointer moves between windows, window expose events will exercise stuff like bcopy/memset that uses AVX2, and so on.
 

soresu

Golden Member
Dec 19, 2014
1,655
844
136
I think for Ryzen the socket will stay the same and will remain on ddr4. For epyc socket will probably change and move to ddr-5. That's the beauty of the IO die.
Could be, could be that TR4 will go DDR5 leaving the mainstream on DDR4 until prices come down from crazy town.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,065
665
136
You have to wonder what was in it for AMD in the first place
Survival.

Running fabs is ridiculously expensive, especially when you need to spend several billion every two years or so to upgrade. AMD was already heavily in debt from upgrading them the last time, and then Opteron revenue crashed and it looked like the options were bankruptcy and selling the fabs. The only interested buyer they could found was ATIC, but they wanted assurances that they will have customers in the future. Hence, the WSA.
 
  • Like
Reactions: spursindonesia

randomhero

Member
Apr 28, 2020
116
178
76
Several pages ago I've speculated about ST performance of Zen4.
I changed my mind after reading some of posts. I am of belief that with 5 nm clocks will go down. What I think I got wrong was that IPC would not compensate enough.
What I have forgotten is that new processes could cut down latencies CCD wide and what I have forgotten even more is advanced packaging. They can get higher inter CCD bandwidth also, from 32 bit to 64 bit per lane per cycle. They could get rid of of SerDes links on package and go wide with some sort of the silicon bridge(full interposer or silicon interconnect bridge) further improving on package bandwidth and possibly latencies as well.
After seeing what have they accomplished with Zen3, how they optimised design to extract as much as possible from limited resources(transistor performance,execution width, etc.), Zen4 on 5nm could come as quite of shock to industry regarding gen to gen uplift in performance.
I have definitely missed a metric ton of things that could be done to improve design from Zen3 to Zen4 since my knowledge on the subject is shallow as pudlle. Thankfully, that's where you guys(and galls!) come in! 🙂
 

Gideon

Golden Member
Nov 27, 2007
1,357
2,594
136
FWIW, revenues for N7, and especially N5 are disproportionate to wafer volume due to their much higher cost.

I can't figure out what the heck Warhol would be on N7 since it's still Zen3, unless it's on some modified N7 process like EUV for a bit more performance. If Warhol is for real, then AMD is taking longer developing Zen4, or needs to wait longer till N5P is up and running in volume.
Yeah i even mentioned the fact that this is disproportional.

Regarding Warhol, my guess its just the same Zen3 chiplets on AM5, new packaging (possibly 2.5D) and new I/O die (with DDR5 support). Extracting more performance from the I/O and uncore side.

EDIT:
Spelling (sorry I tend to write type-messes on my phone)
 
Last edited:
  • Like
Reactions: scineram

dr1337

Member
May 25, 2020
143
219
76
That was because AMD increased the number of cores per CCD from 6 -> 8 for Zen 2.
This doesn't make any sense lol. zen 2 was always going to be two ccxs of four cores each... Its functionally similar to zen1 just with all of the IO moved off die. There was never a six core chiplet design, outside of speculation/fake leaks/rumors.
 
  • Like
Reactions: moinmoin

Makaveli

Diamond Member
Feb 8, 2002
4,313
513
126
It really depends on what they manage to do with Zen3+. If it's like the 3000XT processors where it's just a few percent more Mhz, then it's not going to compete as well. If, instead, they can get both a few hundred Mhz of all core and boost, and also improve IPC by a few percent, then it'll be a different story. Personally, I think that AMD needs to do what they can to improve the all-core clocks the most as it appears that Zen3 was a nice single thread boost, but wasn't quite as much of an improvement in multi-thread scenarios. Given that N6 isn't much of a density improvement, leaning to the clocks/power side of the curve may make the most sense.
I thought the boost in all core clocks was pretty good for me going from a 3800X to 5800X

The Zen 2 chip would hit about 4.25Ghz on full load.

While the Zen 3 chip does about 4.55-4.6Ghz full load.
 
Last edited:

ASK THE COMMUNITY