- Mar 3, 2017
- 1,609
- 5,817
- 136
That's the claim AMD themselves made. Probably talking about core + L2.
Again, where are all of these numbers you keep throwing out coming from? No, Zen 4 in Phoenix is basically identical to Zen 4 elsewhere, but all current indications.
A single Bergamo CPU is worth $15,000 CPU(selling for $22,000 right now in China), so as you can see they will be exceedingly profitable, enough to warrant a different design.My question remains. What is the business case for AMD to design yet another 4N Zen4 CCX at > 140M xtors mm² using a different library. Why do that work twice to not fit more cores in?
A single Bergamo CPU is worth $15,000 CPU(selling for $22,000 right now in China), so as you can see they will be exceedingly profitable, enough to warrant a different design.
There is no enough space on the package to fit 16 of those on the CPU.Sure and a CCD designed around 2 Phoenix CCXs will do the job nicely.
Question is also how much CDNA3 will sell for, that 150B transistor APUA single Bergamo CPU is worth $15,000 CPU(selling for $22,000 right now in China), so as you can see they will be exceedingly profitable, enough to warrant a different design.
You're comparing apples and oranges. The density of a high speed, cache-heavy compute die will be vastly different from an SoC with a GPU and everything else included. I'm sure if you actually look at the Zen 4 cluster in isolation, the density will be very similar to the desktop Zen 4.The APU is 25B xtors in 176mm² which is about 140M xtors per mm² despite all the IO in a monolithic design. Compared to the 94M Xtors per mm² of the normal Zen4 CCD it represents a 49% increase in density.
Question is also how much CDNA3 will sell for, that 150B transistor APU
I’m gonna go on a limb and make a guess, 30K$ea.Yes. It'll sell for Yes.
There is no enough space on the package to fit 16 of those on the CPU.
You're comparing apples and oranges. The density of a high speed, cache-heavy compute die will be vastly different from an SoC with a GPU and everything else included. I'm sure if you actually look at the Zen 4 cluster in isolation, the density will be very similar to the desktop Zen 4.
Yes, it was.That was not the case for Renoir or Cezanne.
Yes, it was.
Just compared from a die shot, the Zen 3 core + L2 cache on Cezanne is the same size as the desktop one.That was not the case for Renoir or Cezanne. Further Navi 31 is not 140M xtors per mm² although getting the density of the 5nm portion of the die alone is tricky because I don't think we have a transistor count for the GCD on its own, just the whole thing.
Anyway we will find out when we get more details but I don't see AMD doing the same thing twice for no benefit (like say squeezing 3 CCXs into a CCD).
Just compared from a die shot, the Zen 3 core + L2 cache on Cezanne is the same size as the desktop one.
I just went and checked and you are right. Locuza did some analysis and the difference between the Cezanne die size and Vermeer die size was purely the L3 cache and lack of TSVs. The core itself was exactly the same.
So given that my entire argument is nonsense since AMD already copy/pasted the existing CCX into the APU. So yes, AMD will need to design a higher density version for Zen4c and that will come with clock speed tradeoffs.
Still find it curious how AMD manage to have such high transistor density for the APUs when it seems the CCXs and the Compute units are the same density as desktop counterparts and there is a lot of IO. I guess the power gating circuitry can be really dense.
IDK, I believe that part is true.RDNA3 with the Navi31 has extreme density that’s mentioned by AMD in the slides. Could however be marketing wash.
The MI300 infographic disclosure did raise even more questions regarding the techs for the upcoming AMD products. It seems to me the CPUs are sharing the IC with the GPU so a possible SLC could trickle down to consumer parts. The CPU L3 as well is not on the top die, this could lend some credibility to the rumor of CPU cores stacked on top of cache or at least on top of an SLC. The GCDs appearing as one single GPU to the SW also would mean they have sorted out the multiple GCD concept but that's for another thread.
However for me, the interconnects are the most interesting part of the MI300 and how that would carry over to Zen 5. Lots of interesting design choices would have to be made for Zen 5 and fairly exciting for the folks working on such things considering the tape out would be within a quarter or so if not done already.
In case there is really stacking throughout the Zen 5 DT/Server product lines
One drawback is that they would need to limit the Tjmax to something like 85 C to keep the hybrid bond from degrading they might need a big IPC jump to recover the lost ST perf ( for instance 7800X3D tops out at 5 GHz ). But the thought of having the big L3 chunks off the leading edge nodes would make the bean counters extremely happy.
- Infinity fabric links via the package would be gone if all dies are stacked on the base die. This would be a necessity if the supposed 64 Gbps links (from LinkedIn) are to be implemented. Current gen Ryzen is bottlenecked by the IF clock to some extent.
- More cores per CCX (alluded to by Mike Clark) sharing an L3 would need some stacking, that would make routing each core to the L3 possible with uniform distance and not bloat the core area too much and leave the RDL on the base die side instead of core die. This could in fact shave off a non trivial amount of traces if the distance to the L3 from the core is cut.
- Base die can host the IO and other PCIe stuffs. They might even get away with N6 for the base die on the desktop
If the CCDs are not stacked, they would need something like the interconnects on RDNA3 to replace the current Z4 GMI. Then they have FinFlex to play with.
I agree. I continue to expect AMD to use more complex packages only for higher performance high margin products. The FAD reference to "AMD chiplet technology" for Phoenix appears to have turned out to be a false flag. At this rate Intel may well be first to chiplet based lower performance mobile parts with Meteor Lake. Will be very interesting to see if that is a competitive cost advantage or disadvantage.the big questions is, would it bring down the cost for consumer CPUs? AMD may separate desktop CPU and Server CPU package design.
I might have missed info, but have they shown that the L3 is moved to the base die? Actually, what (if anything) is confirmed for it?The MI300 infographic disclosure did raise even more questions regarding the techs for the upcoming AMD products. It seems to me the CPUs are sharing the IC with the GPU so a possible SLC could trickle down to consumer parts. The CPU L3 as well is not on the top die, this could lend some credibility to the rumor of CPU cores stacked on top of cache or at least on top of an SLC.
They moved to a ring bus with the move to an 8c CCX, so I don't think stacking is a necessity for them to scale per-CCX core count. Nor do I think 2 layer hybrid bonding alone would be enough to solve that routing issue, so they'll probably keep the ring.More cores per CCX (alluded to by Mike Clark) sharing an L3 would need some stacking, that would make routing each core to the L3 possible with uniform distance and not bloat the core area too much and leave the RDL on the base die side instead of core die.
Putting the cache on the bottom will probably help a lot. No longer need to go through the top die for cooling. And maybe future gens of hybrid bonding can get the thermal resilience up to where it's a non-issue, if that's even the limiting factor today. Might just be a lack of telemetry on the cache die driving a more conservative approach for now.One drawback is that they would need to limit the Tjmax to something like 85 C to keep the hybrid bond from degrading they might need a big IPC jump to recover the lost ST perf ( for instance 7800X3D tops out at 5 GHz )
Zen 5 | Zen 4c | IGP | IGP gaming frequency | Last Level Cache | Power Limit -> gaming | |
---|---|---|---|---|---|---|
6C16T Model | 3 Cores; L2: 6MB | 3 Cores; L2: 1.5MB | 12 CU; 48TMU; 24 ROP | 2400 MHz | Total: 21MB CPU: 9 + IGP: 12MB | 30W CPU: 15W + IGP: 15W |
6C16T Model | 3 Cores; L2: 6MB | 3 Cores; L2: 1.5MB | 16 CU; 64TMU; 32 ROP | 2400 MHz | Total: 33MB CPU: 9 + IGP: 24MB | 35W CPU: 15W + IGP: 20W |
8C16T Model | 4 Cores; L2: 4MB | 4 Cores; L2: 2MB | 16 CU; 64TMU; 32 ROP | 2400 MHz | Total: 36MB CPU: 12 + IGP: 24MB | 35W CPU: 15W + IGP: 20W |
8C16T Model | 4 Cores; L2: 4MB | 4 Cores; L2: 2MB | 20 CU; 80TMU; 40 ROP | 2400 MHz | Total: 48MB CPU: 12 + IGP: 36MB | 40W CPU: 15W + IGP: 25W |
8C16T Model | 4 Cores; L2: 4MB | 4 Cores; L2: 2MB | 24 CU; 96TMU; 48 ROP | 2400 MHz | Total: 60MB CPU: 12 + IGP: 48MB | 45W CPU: 15W + IGP: 30W |
10C20T model | 4 Cores; L2: 4MB | 6 Cores; L2: 3MB | 16 CU; 64TMU; 32 ROP | 2800 MHz | Total: 46 MB CPU: 14 + IGP: 32MB | 50W CPU: 20W + IGP: 30W |
10C20T model | 4 Cores; L2: 4MB | 6 Cores; L2: 3MB | 20 CU; 80TMU; 40 ROP | 2800 MHz | Total: 60MB CPU: 14 + IGP: 46MB | 57W CPU: 20W + IGP: 37W |
12C24T model | 4 Cores; L2: 4MB | 8 Cores; L2: 4MB | 16 CU; 64TMU; 32 ROP | 2800 MHz | Total: 48MB CPU: 16 + IGP: 32MB | 55W CPU: 25W + IGP: 30W |
12C24T model | 4 Cores; L2: 4MB | 8 Cores; L2: 4MB | 20 CU; 80TMU; 40 ROP | 2800 MHz | Total: 62MB CPU: 16 + IGP: 46MB | 62W CPU: 25W + IGP: 37W |
12C24T model | 4 Cores; L2: 4MB | 8 Cores; L2: 4MB | 24 CU; 96TMU; 48 ROP | 2800 MHz | Total: 76MB CPU: 16 + IGP: 60MB | 70W CPU: 25W + IGP: 45W |
14C28T model | 6 Cores; L2: 6MB | 8 Cores; L2: 4MB | 16 CU; 64TMU; 32 ROP | 2800 MHz | Total: 52MB CPU: 20 + IGP: 32MB | 60W CPU: 30W + IGP: 30W |
14C28T model | 6 Cores; L2: 6MB | 8 Cores; L2: 4MB | 20 CU; 80TMU; 40 ROP | 2800 MHz | Total: 66MB CPU: 20 + IGP: 46MB | 67W CPU: 30W + IGP: 37W |
16C32T model | 8 Cores; L2: 8MB | 8 Cores; L2: 4MB | 16 CU; 64TMU; 32 ROP | 3200 MHz | Total: 64MB CPU: 24 + IGP: 40MB | 75W CPU: 35W + IGP: 40W |
16C32T model | 8 Cores; L2: 8MB | 8 Cores; L2: 4MB | 20 CU; 80TMU; 40 ROP | 3200 MHz | Total: 80MB CPU: 24 + IGP: 56MB | 85W CPU: 35W + IGP: 50W |
16C32T model | 8 Cores; L2: 8MB | 8 Cores; L2: 4MB | 24 CU; 96TMU; 48 ROP | 3200 MHz | Total: 96MB CPU: 24 + IGP: 72MB | 95W CPU: 35W + IGP: 60W |