Mark Papermaster said THIS week, Milan is shipping since Q4 2020. And next gen Epyc (Genoa) is on track for 2022 and will leverage 5nm TSMC
direct qoute:
"And the choice of EPYC configurations is going again with the third-generation Milan that will be launching later this month, but has already been shipping in select accounts since the end of last year."
"In fact second gen and third gen will be in the market coincidence. And we're on track with fourth-gen EPYC to go-to-market in 2022."
The rumors aren’t quite what I expected. The mock-up of Genoa does not appear to use any die stacking unless there is more than one layer in the IO package or cpu packages. That could make some sense to try stacking in the IO die first since it should be lower power than cpu cores and lower risk. I kind of expected that we would see infinity cache in the Genoa IO die. The UMC (taking DDR vs. QDR graphics memory and such into account) seems like it would be very similar internally. It could be plausible, although wild speculation, that the IO die is 2 layer device with a layer made on an older process with the physical interfaces and another layer on a newer process for logic and infinity cache. An Epyc processor 128 MB L4 would be amazing. I don’t think they would want to make L4 cache on GF process, so it would make sense that either TSMC makes the whole thing on older process tech or GF only makes the interposer (actual IO) portion.
The mock up doesn’t look like any die stacking at all. When I heard the 96 core rumor I was thinking that they might make a stacked device with possibly the multi-layer IO die described above (essentially an active interposer) and 4 cpu chiplets stacked on top. That would allow them to make devices with 32-cores and room for an HBM gpu on either side. It also would allow placing cpu cores on either side for 96 cores, but latency would probably be asymmetric, so it seems unlikely. With a lot of cache, it might not make that much difference though. The other thought was that they might stack two cpu die for a maximum of 128 core and 96 core was just one sku. They could connect to the IO package with one link in the same way that 2 CCX share one link in Zen 3. It would be a much faster link though.
It is kind of disappointing if we don’t get any of this stuff in Zen 4, but if Zen 4 is a completely new architecture, then that would make up for it a bit. If they did go up to 12 links, then it would make sense for each quadrant (and the desktop parts) to have 3 cpu links, 3 DDR5 (whatever that means for DDR5), and 2 x16 pci express. They don’t really need to increase the IO; Zen 3 already has ridiculous levels of IO bandwidth.
I doubt that the CCX will be more than 8 cores and 32 MB L3 unless stacking is used. I thought that one possibility for stacking is to place some or all of the L3 onto a separate die stacked with the cpu die. They could then bin cache die by usable size, possibly offering up to 64 MB. The cache die could also possibly be made in a different process that is better for making SRAM. That could also save valuable fab capacity. The cpu die would be very power dense without L3 though. That is wild speculation, but if it is that much of a new architecture, then they may change the cache hierarchy significantly. I expect we are going to, at a minimum, get a much larger L2 cache. 32 MB is still very large for one 8 core CCX for L3.