Solved! Speculation: Zen 4 (EPYC 4 "Genoa", Ryzen 7000, etc.)

What do you expect with Zen 4?


  • Total voters
    330
  • Poll closed .

Vattila

Senior member
Oct 22, 2004
766
1,223
136
Except for the details about the improvements in the microarchitecture, we now know pretty well what to expect with Zen 3.

The leaked presentation by AMD Senior Manager Martin Hilgeman shows that EPYC 3 "Milan" will, as promised and expected, reuse the current platform (SP3), and the system architecture and packaging looks to be the same, with the same 9-die chiplet design and the same maximum core and thread-count (no SMT-4, contrary to rumour). The biggest change revealed so far is the enlargement of the compute complex from 4 cores to 8 cores, all sharing a larger L3 cache ("32+ MB", likely to double to 64 MB, I think).

Hilgeman's slides did also show that EPYC 4 "Genoa" is in the definition phase (or was at the time of the presentation in September, at least), and will come with a new platform (SP5), with new memory support (likely DDR5).

Untitled2.png


What else do you think we will see with Zen 4? PCI-Express 5 support? Increased core-count? 4-way SMT? New packaging (interposer, 2.5D, 3D)? Integrated memory on package (HBM)?

Vote in the poll and share your thoughts! :)
 
Last edited:

Vattila

Senior member
Oct 22, 2004
766
1,223
136
This [MI300 server] APU is one engineering feat from AMD
The reveal of the Instinct MI300 server APU was certainly the star of the Financial Analyst Day as far as immediate roadmap goes. David Wang, senior vice president of engineering for the Radeon Technologies Group at AMD, was positively buzzing as he revealed the CDNA 3 Unified Memory APU Architecture behind the chip, although he left the reveal of the chip itself to Forrest Norrod, senior vice president and general manager of the Data Center Solutions Group. Interestingly, after Raja Koduri burned out and left for Intel in late 2017, David Wang rejoined AMD in early 2018, apparently to take on a three-generation roadmap to GPU leadership, put in place by AMD CEO Lisa Su to mimic their success with the "Zen" CPU roadmap. It seems he has done well!

Presumably, as other sources have hinted, Instinct MI300 will be used in the 2+ exaflop El Capitan supercomputer set to arrive next year. It is great to see the Exascale Heterogenous Processor finally coming to fruition after a decade of research and development. I expect we will hear more about MI300 in additional supercomputer wins soon.

With this manifestation of "Zen 4" in high-end server APU form, all the potential features I listed in this thread's associated poll have become a reality, with the exception of 4-way SMT. Only 16% of the voters expected integrated memory on the package, but this feature has now indeed been confirmed, as MI300 comes with a "Unified Memory APU Architecture" with HBM memory integrated in the chip. Obviously, this thing will use the latest advanced packaging, and it will be very interesting to eventually see the details on how it is all put together — especially the Infinity Cache, I/O and interposer/bridges (a fan-out layer with Elevated Fanout Bridges will be used, I guess, similar to MI200).

I presume the Infinity Cache is an L4 cache, or perhaps a System Level Cache on the memory side of the memory controller (HBM, CXL.memory and CXL.cache protocols, but no DDR support, maybe), sitting below the GPU and CPU chiplets, and that the CPU chiplets will be ordinary "Zen 4" CCDs with the ability to stack L3 V-Cache chiplets on top. It will also be interesting to see whether AMD will be able to stack V-Cache higher than one layer in the "Zen 4" generation.

AMD: Combining CDNA 3 and Zen 4 for MI300 Data Center APU in 2023 (anandtech.com)



 
Last edited:

Doug S

Golden Member
Feb 8, 2020
1,492
2,182
106
You realise that this information is behind the paywall? That's not fair to Charlie.

If Saylick is a subscriber (especially if his company pays for it not him personally) then yeah I'd agree. I would assume Charlie requires subscribers agree not to publicly repost information from his articles.

If however Saylick found that information repeated elsewhere by someone else then its fair game, IMHO.

I may be annoyed that all of Charlie's best info is behind a paywall, but he's got a right to make a living.
 

esquared

Forum Director & Omnipotent Overlord
Forum Director
Oct 8, 2000
22,826
4,194
136
I am going to say this once and once only.

Stop with the the insults.
This is an AMD Zen 4 thread, not an Intel thread.
If you cannot keep your comments to the topic at hand, you will be infracted.

This thread is now locked for the next few hours to get you people to calm down. If you come back here to troll AMD, I will vacation you, so think very clearly before you post.



esquared
Anandtech Forum Director
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,571
1,118
136
That is the embedded roadmap though.

EPYC:

Embedded EPYC:

It isn't saying Milan should have been 2020Q3-2021Q2;

It is saying Embedded Milan should have been 2020Q3-2021Q2;

Also, effectively the google date the Embedded website is Aug 4, 2020
Was only listed on: Aug 5, 2020.

Meaning that the Embedded Epyc 7001/7002 only launched then in August 2020.

Jan 19, 2021 upload date
Googling it with "V3000"/"Zen4"/"7004" points it to having the slide.

Identifying the asterisk:
*AMD roadmaps are subject to change without notice or obligations to notify of changes.
Placement of boxes is not intended to represent first year of product shipment.
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,391
5,025
116
I'm kinda disappointed nobody seems to have really thought about what AMD talked about with regards to Zen 4C honestly.

Same ISA support, same IPC, and now also confirmed to use half of the core area (key word being core, mind you).

That puts Zen 4C in similar size regions to ARM cores such as V1. That's kind of a big deal.

Also entirely unrelated sidenote but:


I did warn you guys. For Raphael vs Raptor Lake the issue isn't power consumption, it's cooling. Thermal density is not a fun thing.
 

leoneazzurro

Senior member
Jul 26, 2016
787
1,180
136
I can't say I am impressed with Zen 4 Desktop SKUs.
Zen4 SKUs released by AMD are less energy efficient than their predecessors even though they are on 5nm. High clocks are simply killing any efficiency gains from the process.
Ryzen 5 5600X vs Ryzen 5 7600X -> 65W vs 105W (+62%)
Ryzen 7 5700X vs Ryzen 5 7700X -> 65W vs 105W (+62%)
Ryzen 9 5900X vs Ryzen 5 7900X -> 105W vs 170W (+62%)
Ryzen 9 5950X vs Ryzen 5 7950X -> 105W vs 170W (+62%)
I am more interested in model without X in their name and back to 65 and 105W TDP.
If you consider "energy efficiency" simply as a measure of the total energy consumption of a CPU, sure, but the common understanding about "energy efficiency" is perf/W and at the moment we have no data to say it got worse, instead we have AMD's claims (which turned out fairly accurate in the past) of the exact opposite. Let's also not forget that big part of this is also due first to to Intel raising their power consumption a lot in order to be competitive with the top AMD models, because many users look only at pure performance without considering how that performance is obtained. Of course is also Intel's merit to have created a hybrid architecture that is more competitive for the lower end (due to more cores in the form of Gracemont, mainly) forcing AMD to raise clocks on the lower SKUs. It will be interesting to see what happens in two years, when Zen5 lands, and that from current leaks is supposed to be something like an hybrid architecture too.
 

amrnuke

Golden Member
Apr 24, 2019
1,175
1,767
106
It is time to push efficiency charts in the foreground. Problem solved.
Sure, then everyone will complain about the lack of progress in performance. Then you have to push performance charts to the foreground. Rinse. Repeat. (Kidding, of course...)

Zen is not marketed to the public for efficiency, it's marketed for performance. How do you think people would be reviewing the product if AMD chose to instead limit power draw and ended up being just equal to Alder Lake, in the name of efficiency? Seems to me it would be bad publicity, especially with Rocket Lake not even released yet.
 

coercitiv

Diamond Member
Jan 24, 2014
5,364
8,929
136
if Nosta is to be believed at all
Name one process & architecture combo "leak" that Nosta talked about in the past and turned out to be true. The amount of fantasy nodes and architectures is staggering, and yet people still eat this crap with a spoon.

You would have better chances at predicting the future of AMD products by tossing a coin. My cat would have better chances of predicting AMD product & node mix, and I don't have a cat. It's still better than what Nosta predicts because me getting a cat and using it to make predictions is still within the realm of possibility in this universe.
 

Hans de Vries

Senior member
May 2, 2008
321
1,017
136
www.chip-architect.com
What's most surprising to me is the I/O die. I'm really not at all surprised by the CCDs if I'm honest, but the I/O die - new nodes have little to know effect on analog circuit density, and Genoa has a very significant increase to I/O (12ch DDR5, 128 PCIe5 lanes etc etc), yet despite that, the I/O die is actually smaller than Rome's.
The physical I/O of 7nm Cezanne is quite small. The 128 bit bus is just 5% of the 180mm2 die (The top-right rectangle) or 9 mm2.

Cezanne_die.jpg

This makes me think that AM5 may jump over Alderlake's 1700 pin package to 2000+ pins (from 1331 for AM4)
It would make desktop motherboards with 4 memory slots, each with its own channel.

AM5 needs to support 3nm CPU's and APU's. Two channel LPDDR4-4266 is already exhausted by 8 VEGA compute units. (Rembrandt has 12 Navi2 compute units on 6nm). The 5nm Rafael has an unknown amount of Navi3 compute units on AM5 and they could be clocked north of 3GHz.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,571
1,118
136
raphael.png

Feb 2021+ product bring up.

I can only find GPUs and one other IP bring ups so far with CPUs being less exact:
Vega10 bring up: Jan 2017-April 2017 => August 2017 launch
MI50/MI60 bring up: Jan 2018(start month) => November 2018 launch
Navi10 bring up: July 2018+ => July 2019 launch
PCIe 4.0 bring up in client/server: September 2018+ => July/August 2019 launch
Fiji bring up: Sept 2014 - November 2014 => June 2015 launch
Ontario bring up: 2010(no exact month) => January 2011 launch
Mullins bring up: 2013(no exact month) => April 2014 launch
Kaveri bring up: August 2013 => January~June 2014 launch
Radeon Pro 560x bring up: 2017(no exact month) => July 2018 launch
Radeon Pro Vega bring up: 2018(no exact month) => November 2018 launch
MI100 bring up: July 2019+(no exact start) => November 2020 launch
Zen SoC/Server (Zeppelin) bring up: Before August 2016, After November 2015 => March 2017 launch
MI200 bring up award: December 2020 => not yet launched.
3 launched within the year of first bring-up mention.
9 launched the year after first bring-up mention. Of those, majority of the mentions are during the later half/second half of the year.

On DDR-side LPDDR5/DDR5 has two spots of bring up: January 2020+ and July 2020+.

So, Raphael launching earlier than expected is more likely. ¯\_(ツ)_/¯
 
Last edited:

dnavas

Senior member
Feb 25, 2017
355
190
116
An 18 months release cycle would put Zen 4 to Q2 2022. However, Zen 4 brings a brand new platform with DDR5/PCIe5. So delays are to be expected.
I'm personally finding it increasingly difficult to justify investing in a pcie4 TR platform purchase, so I hate what this does for Zen4 TR, but it frankly doesn't make sense to ship a DDR5 platform prior to DDR5 being available in quantity. So yes, I'd expect later than sooner.

Unless [I add, somewhat self-servingly] you release on the platform that's already more expensive and might be more willing to absorb the cost. I don't expect AMD is doing this, but it seems like it would make some sense to put the TR chiplets after desktop, so that desktop can bake the design, but put the TR IO on the bleeding edge. You allow TR to benefit from the choice of the best chiplets, and allow the desktop to benefit from the better/cheaper supply chain of parts.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,418
4,780
136
I'm pretty sure since Zen2/3, AMD has been using the most dense libs available.

Ex:
View attachment 44460

The actual design reasons for AMD going for Hi-Freq, rather than going full on Hi-Inst will probably remain unknown. For all we know it could be marketing-sided rather than engineer-sided. As it is psychologically easier to sell a chip that has increased frequency over the last generation.

Above, extended w/ actual high-end on AMx:
Ryzen 7 1800X = 4.1 GHz boost
Ryzen 7 2700X = 4.3 GHz boost
Ryzen 7 3950X = 4.7 GHz boost
Ryzen 7 5950X = 4.9 GHz boost
Zen3 is using N7 HD indeed. While RDNA is using N7 HP.
For N5 the most optimal range for Zen 4 would be around 3.8-4.2 GHz according to the Shmoo plot, after that would need considerable jump in voltage for getting the frequency to the same levels like 5950X for example.
One of the advantages of designs which top out at ~3.2 GHz is that they are well below this value resulting in big gains in efficiency.

1621235308312.png

Why doesn't AMD use the high density process? Wouldn't the much higher IPC made possible by many more transistors make up for the lost frequency? Plus, it would be much more energy efficient
My thinking is that when they originally designed Zen1-Zen3 using the CCX concept they intended it to be very small and easily manufacturable. It was supposed to be cheap to produce. The high clocks could help get more performance.
Zen3 Core and upto L2 is quite small in comparison to most contemporary designs. Zen3 core (without L3) is less than half the MTr of M1
However when Zen2 and later Zen3 landed they need to tack on the big L3 to handle the weakness of the memory hierarchy with the multiple CCXs
In the end Zen3 got big anyway. Also making the MTr or the active core too high while operating in high frequency would have increased the TDP by a bit

For Zen4 with this learning it should be interesting if AMD would raise the transistor count drastically or again stick with smaller core.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,418
4,780
136
PCIe 5 is pretty much irrelevant for consumers right now and will remain that way for many years. All it does is increase motherboard costs. The only thing it possibly could affect is storage, because GPUs aren't going to come close to saturating a PCI4x16 link any time soon.
PCIe 5.0 is used as the physical layer for CXL and is supposed to bring in a new era of cache coherent accelerators.

I hope this is not the case.
Otherwise it means buying some Zen4 based TR Pro to try out some CXL based accelerators.
Only option on AM5 would be to go A+A, which is kind of against AMD open and inclusive philosophy. Issue with this is that AMD only got GPUs at the moment, so if you wanna try out FPGA based CXL capable accelerators like the ones from Xilinx you would be out of luck on AM5
It makes no sense to not have it, if they already have the PCIe 5.0 IP on Genoa. They could just have the support in the IOD and the chipset and let the Board OEMs take the cost on the high end boards.
Either that or this generation has no CXL support which is going to be an issue with developers wanting to try out CXL based accelerators.

Not happy with this.
 
  • Like
Reactions: Tlh97 and Kepler_L2

Gideon

Golden Member
Nov 27, 2007
1,530
3,226
136
I don't know, and to be frank, I don't entirely care. He's said enough bollocks for me to know that he's more than happy to either make stuff up or trust things from absolutely anyone.
100% this. Usually just pure informed speculation leads to much more accurate facts that what these leakers claim.

So many of these leaks go blatantly against common industry facts that it hurts (and this goes against all of MLID, Adored and Coreteks), Things like:
  • Claiming not having working silicon in the labs less than year before release
  • Claiming things that would require changes to silicon (other than respins) less than a year before release.
  • Claiming something has been designed but might not be released - This happens, but very rarely, as the R&D money has already been spent, it would literally have to be unsellable to get canned. (Things such as designing a 24 core Genoa and not announcing it while releasing a 16 core one makes 0 sense)
  • And the big one: Knowing SKUs and pricing 6+ months pre-release (when these are the last things that get decided. Especially the pricing as it's the only thing that can be changed easily, even hours before release)

But what really grinds my gears is if they get something wrong, they almost never admit that it was (someone's) poor speculation. Near always there is the excuse of "oh it must have been canned/postponed/changed last minute".

I still remember Adored being hell-bent that Navi will release in January Q1 2019, up to late December 2018. And when it didn't happen it was just casually "postponed due to yields". It ended up "being postponed" for 7 months. I'm sure AMD had no idea of the state their yields a month before release :p
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,418
4,780
136
AMD HSA is here


AMD is building a system architecture for the Frontier supercomputer with
a coherent interconnect between CPUs and GPUs. This hardware architecture
allows the CPUs to coherently access GPU device memory. We have hardware
in our labs and we are working with our partner HPE on the BIOS, firmware
and software for delivery to the DOE.
The system BIOS advertises the GPU device memory (aka VRAM) as SPM
(special purpose memory) in the UEFI system address map. The amdgpu driver
looks it up with lookup_resource and registers it with devmap as
MEMORY_DEVICE_GENERIC using devm_memremap_pages.
Now we're trying to migrate data to and from that memory using the
migrate_vma_* helpers so we can support page-based migration in our
unified memory allocations, while also supporting CPU access to those
pages.
 

Gideon

Golden Member
Nov 27, 2007
1,530
3,226
136
Having something like 250 GB/s of IO bandwidth with 128 pci-express 4.0 links seems like it would have been the deciding factor.
That wasn't really true for this case.

Aurora was supposed to be ready earlier, was won by Intel and is being built with Sapphire Rapid chiplets that have PCIe 5.0, "Rambo cache" chiplets, HMB2 on package if needed (and it looks like similar unified-memory-space software). The problem is it's using micro-bumps for stacking (well it's also very late, but that wasn't certain when Frontier was announced). So if anything Intel had the I/O advantage.

There had to be some secret sauce in AMDs offerings to win Frontier like they did. This is certainly one key differentiator. Bear in mind the V-cache solution actually most likely has two layers (as it sits on top of 32MB L3 and is exactly as big on the same process). There is nothing stopping AMD from adding more layers for some server CPUs and I'm convinced now CDNA2 has this stacking as well.

And while all of this is only possible because of AMD's engineering prowess, keep in mind that this is also TSMC's win as much as it's AMDs. They're the only foundry that has anything like that ready in this time-frame. The hoops TSMC had to go through to make this work (and be producible at scale) are also enormous.

All in all ever since Zen 2 it looks like it's the trifecta of execution (Synopsis + AMD + TSMC) that is to be congratulated. AMD couldn't just do it alone.
 
Last edited:

Doug S

Golden Member
Feb 8, 2020
1,492
2,182
106
Chatting on an internet forum doesn't need most of the instruction sets modern day CPUs provide. Why power all that silicon? Playing a game requires a number of instruction sets that aren't normally used. During that time, the small cores can be put to sleep, giving the big cores more headroom (by way of TDP) to run.

That's completely wrong. You think posting to Anandtech doesn't use SIMD instructions? Check out whatever is responsible in your OS kernel for zeroing pages when a new page is needed, it probably uses AVX2 in some circumstances - and that's the tip of the iceberg. You think floating point isn't needed? Sorry, all math in Javascript is done in floating point, there's no way to avoid it if you are running a browser.

I doubt there's anything you can do with a modern PC or smartphone that would allow any worthwhile reduction of instruction set coverage. Not even running an "idle loop" (which is a halt instruction these days) because there are always background/housekeeping processes running at times so the scheduler, I/O dispatch, filesystem, and other parts of the kernel will remain active.

I don't think you can usefully cut out any instructions from a small core other than 1) AVX512 (and that's only true on x86 because Intel didn't provide for variable SIMD width capability like SVE2) and 2) virtualization. Anything else you cut out will mean almost every thread will be forced onto big cores before long.
 

Doug S

Golden Member
Feb 8, 2020
1,492
2,182
106
…and if I am running with javascript disabled? what if i am writing code in vim? What if the machine is a simple file sharing machine? There are plenty of opportunities to use a small core over a big one. Even something as basic as tracking a mouse pointer doesn’t need to use a big core.
OK sure if you are one of the niche cases of people who disable Javascript or run CLI stuff in console mode, fine I'll grant you that. The overwhelming majority of PC/smartphone users don't do stuff like that.

Tracking a mouse pointer doesn't need the performance of a big core, but it will almost certainly exercise your whole instruction set. Do you have any idea of the size of the hot code footprint tracking a mouse pointer on an otherwise idle system these days? A modern GUI is multiple layers of libraries.

A typical person who will leave Javascript enabled will exercise floating point if that mouse cursor moves in any browser window. When the pointer moves between windows, window expose events will exercise stuff like bcopy/memset that uses AVX2, and so on.
 

ASK THE COMMUNITY