- Mar 3, 2017
- 1,747
- 6,598
- 136
I don't imagine that they will launch the CPU only MI400 SKU without its CDNA variants which may not be ready by 2024.If the demand stays high, I can easily see AMD shrink it down to 3nm, throw in some improvements, and add Zen 5 in the 2024 timeframe.
They can get very dense SRAM cache on a process optimized for it. Since the infinity cache is connected to the compute die in MI300 with SoIC, it can be as fast as on-die caches. I don’t know if there is need in the hierarchy for an explicit memory side cache.They'll need SRAM (or something equivalently low latency) for anything up through L3 cache, but if they add an additional memory side cache, they could afford to focus more on capacity. That could be an opportunity for different technology. Probably would be a volatile memory, however. Doubt MRAM has any appeal.
I'm not saying that it will be CPU only. I think it could be upgraded to Zen 5, which will be the first platform on the server with Xilinx IP, and they could shrink CDNA and add a few minor improvements to get a boost.I don't imagine that they will launch the CPU only MI400 SKU without its CDNA variants which may not be ready by 2024.
I don't think so, Nvidia is NOT standing still, so MI400 will still need a much larger increasing in performance than AMD would get from an optical shirnk.I'm not saying that it will be CPU only. I think it could be upgraded to Zen 5, which will be the first platform on the server with Xilinx IP, and they could shrink CDNA and add a few minor improvements to get a boost.
I think going forward AMD have the benefit of having a compact high density CPU + accelerator platform which may offer advantages beyond just equal or better accelerator performance for some customers.I don't think so, Nvidia is NOT standing still, so MI400 will still need a much larger increasing in performance than AMD would get from an optical shirnk.
AMD already has a bunch of advantages here with a much higher bandwidth, more HBM, and a truly unified memory system compared to Grace-Hopper. Grace-Hopper will have higher capacity within the package, but only at around 500 GB/s to the LPDDR. AMD can build machines with SH5 gpus paired with dual socket SP3 cpus with close to 500 GB/s memory bandwidth to each cpu socket. AMD also probably has lower cost to produce since it uses small chiplets rather than giant, monolithic die. It will still be very expensive, but I suspect Nvidia's solution will be just ridiculously expensive.I think going forward AMD have the benefit of having a compact high density CPU + accelerator platform which may offer advantages beyond just equal or better accelerator performance for some customers.
This is something nVidia will never have in the x86 space at least, hence their move to cover lotsa ARM cores with Grace.
I seem to recall some research a while back pertaining to ARM based chiplets from somewhere.AMD already has a bunch of advantages here with a much higher bandwidth, more HBM, and a truly unified memory system compared to Grace-Hopper. Grace-Hopper will have higher capacity within the package, but only at around 500 GB/s to the LPDDR. AMD can build machines with SH5 gpus paired with dual socket SP3 cpus with close to 500 GB/s memory bandwidth to each cpu socket. AMD also probably has lower cost to produce since it uses small chiplets rather than giant, monolithic die. It will still be very expensive, but I suspect Nvidia's solution will be just ridiculously expensive.
Nah, I'm sure Nvidia has something cooking up. Grace-Hopper is just a stopgap product pending a more integrated solution.I seem to recall some research a while back pertaining to ARM based chiplets from somewhere.
It seems unlikely that ARM or its server/datacenter partners will stay monolithic for long given how ridiculously expensive chip design is becoming even with ML based placement assistance and other computational design tools.
Grace-Hopper 2: Grass-Hopper 😂🤣😆Nah, I'm sure Nvidia has something cooking up. Grace-Hopper is just a stopgap product pending a more integrated solution.
I wonder if we'll see in our lifetime a Jensen-Huang. I've heard JHH has an ego, and many analysts see him as some sort of AI god/pioneer/etc.Nah, I'm sure Nvidia has something cooking up. Grace-Hopper is just a stopgap product pending a more integrated solution.
That seems more like a Musk move.I wonder if we'll see in our lifetime a Jensen-Huang. I've heard JHH has an ego, and many analysts see him as some sort of AI god/pioneer/etc.
It will be curious to see if AMD backtracks on high TDP/power limits with Zen 5. Zen 4 at 105/142W loses very little in terms of performance vs. stock, and by raising power limits, messaging about the chip became a lot more negative.
Zen 5 hasn't powered on yet?
I do not see a world in which peak power for Zen 5 per core is lower than that for Zen 4.It will be curious to see if AMD backtracks on high TDP/power limits with Zen 5. Zen 4 at 105/142W loses very little in terms of performance vs. stock, and by raising power limits, messaging about the chip became a lot more negative.
AMD should give all the ‘X’ chips 125W TDPs (170W PPT) next round. More room to clock high, but still efficient.
From TSMC official numbers (best case scenarios)Maybe if it had a good node shrink to prop it up, but N4P v s N5 ain't it. With that in mind, I'm not sure backtracking on power limits would make sense.
It's likely to be similar to Zen3 vs Zen2, where the perf gap is small at low power (15-25W) but grows significantly at 45W+.From TSMC official numbers (best case scenarios)
N4P --> N3E is ~10% efficiency gain.
N5 --> N4P is ~22% efficiency gain
N5 --> N3E is ~35% efficiency gain
Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.
From architecture they should be able to extract some efficiency as well.
But if net loss of efficiency due to bigger/wider cores then the execution is lacking.
They might be driving the core to even more frequencies is the only reason that would be acceptable.
It's numbers like that that remind me how good N4P is relative to its release. N3-family nodes will take awhile to become significantly better in any metric outside of maybe density.Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.
I don't think there will be a net loss in efficiency?From TSMC official numbers (best case scenarios)
N4P --> N3E is ~10% efficiency gain.
N5 --> N4P is ~22% efficiency gain
N5 --> N3E is ~35% efficiency gain
Even 15% efficiency gain from process is significant, if AMD cannot extract that much at the very least then it is poor execution.
From architecture they should be able to extract some efficiency as well.
But if net loss of efficiency due to bigger/wider cores then the execution is lacking.
They might be driving the core to even more frequencies is the only reason that would be acceptable.
IMO the only way forward is a significantly wider core, which absolutely means more power, when totally maxed out (far past it's efficiency optimum).I do not see a world in which peak power for Zen 5 per core is lower than that for Zen 4.
Maybe if it had a good node shrink to prop it up, but N4P v s N5 ain't it. With that in mind, I'm not sure backtracking on power limits would make sense.
Lol, everytime I see the Dense core variant mentioned.....Zen5 to use CCX design? What's the purpose? Maybe a typo(Zen5c instead of Zen5)?
View attachment 82572
Yeah this is a straightforward consequence of a wider, deeper microarchitecture which AMD somewhat explicitly alluded to especially on the frontend and I believe will take Zen to parity with Golden Cove on sheer size. All else equal - and the node isn't changing afaict for mobile - it's going to require more at a given frequency unless gating was just lackluster on Zen 4. Of course for a similar performance itself I suspect energy efficiency or power efficiency will be improved, likely because you won't have to clock as high/raise voltages as high. Idle could improve as well but that's more uncore stuff.I don't think there will be a net loss in efficiency?
I specifically stated higher peak power. I'm going to double down on this actually. I think that at any given clock frequency Zen 5 will require more power than Zen 4.
That doesn't mean I believe Zen 5 will sport worse power efficiency than Zen 4 (at least at any frequency where it actually matters). Just that the peak of what is possible will be increased.
What, from the tweet, makes you draw this conclusion?Zen 5 hasn't powered on yet?
