- Nov 14, 2011
- 10,297
- 5,289
- 136
Based on Neoverse cores, not a Denver derivative.
This makes sense with the ARM acquisition attempt. They want to fully control the server stack.
You're missing the point of this system. The question really is does the CPU need to be that powerful in a system designed around the GPUs. We also don't know how many cores it has, the power consumption, etc.
Even in compute dense scenarios like we see in supercomputers, well over half of them (under category "accelerators/co-processors") don't have any accelerators or co-processors to speak of so yes high performance CPUs are still very relevant ...
That's almost entirely due to GPU memory capcity issues, which is exactly the problem that Grace is trying to fix.
If that were truly the case then why didn't AMD APUs take off in the high-end compute segment ? An integrated CPU/GPU solution would've elegantly solved the asymmetric memory system differences between the CPU and the GPU ...
AMD hasn't made an APU aimed at HPC yet.
Because they didn't make one for that segment and because it's too niche to be profitable? In such systems dedicated will always win as power is much less if a concern. The APUs failed because single-threaded CPU perfromance is actually most relevant metric on consumer client devices. Apple got that right. In HPC with high parallel workloads? MT is more important and dedicated accelerators. But yeah bulldozer was just bad in all metrics.If that were truly the case then why didn't AMD APUs take off in the high-end compute segment ?
Ultimately, what this tells us is that following AMD's original strategy has little to no basis for a successful outcome so how is Nvidia going to change this ?
If that were truly the case then why didn't AMD APUs take off in the high-end compute segment ? An integrated CPU/GPU solution would've elegantly solved the asymmetric memory system differences between the CPU and the GPU ...
Not say it will but NV does have a near monopoly in the AI/ML space with CUDA. So moving all your software away from that is probably more painful than a bit slower CPU, in ST at least.
"A bit slower CPU" is putting it nicely when the AMD equivalent in the future is going to be well over >50% faster per socket compared to Grace. It's way harder for Nvidia to solve their CPU design deficit compared to AMD/Intel solving their software deficit since they have optimized x86 code going for them at least while Nvidia has never had a good CPU design team or CPU performance leadership in the past ... (and they still aren't going to be anywhere close to either AMD or Intel in the future too)
The memory subsystem on the APUs was always pitiful, and AMD were miles behind on the tools for compute. They had half-baked OpenCL going up against CUDA.
A compelling heterogenous compute platform won't ever start with the GPU if we take AMD's fruitless experiments as an example so whatever approach Nvidia takes will doom them if they don't have competent CPUs ...
Every vendor is free to balance the ratio CPU/GPU. They are no magic number. And NV is a disruptor. We will see which system will be mediocre when Grace will be available.if Nvidia's future DGX systems needs twice the number of CPU sockets to be competitive against the x86 alternatives then maybe Grace CPUs are mediocre and they have a failure on their hand ... (the most common systems will always consist of either 1P or 2P and rarely does 4P/8P ever see deployment)
Even Frontier, the newest supercomputer uses single CPU socket server nodes and the delayed Aurora supercomputer would've used dual CPU socket server nodes ...
Funny, mediocre is exactly what I think of CXL. Too little too late, like every standard that is ratified on lowest common ground. CXL bandwidth is totally unable to handle 8 Ampere Next without huge loss of performance. It's really mediocre compared to Grace DGX...A system which replaces CXL capability with a lot of mediocre CPUs, gotcha.
Integrated CPU/GPU like in the most powerful computer in the world? Say hello to Fujitsu A64FX and their "mediocre" ARM CPUIf that were truly the case then why didn't AMD APUs take off in the high-end compute segment ? An integrated CPU/GPU solution would've elegantly solved the asymmetric memory system differences between the CPU and the GPU ...
Open standards have to rely on existing infrastructure which in case of CXL is PCIe v5 which isn't on the market yet. Manufacturers are obviously free to push ahead with proprietary solutions, which (as well as the marketing around it) is exactly what Nvidia is very good at, as well as the resulting ability to use that as vendor lock-in.Funny, mediocre is exactly what I think of CXL. Too little too late, like every standard that is ratified on lowest common ground. CXL bandwidth is totally unable to handle 8 Ampere Next without huge loss of performance. It's really mediocre compared to Grace DGX...
I find it odd to read talks about how "fruitless" AMD's HSA efforts supposedly have been when we are currently watching live all the results of DARPA's Fast Forward initiatives with the whole Zen and R/CDNA families which are furthermore about to accumulate in the Frontier and El Capitan exascale supercomputers, both which will feature tightly coupled CPU/GPU nodes.
Fusion was little more than a consumer brand for the HSA strategy at best, necessary since at the time the iGPUs were competitive whereas the CPUs weren't.I think you might want to read my posts again if you got that impression because I didn't say that 'HSA' was fruitless but their concept of 'Fusion' certainly is ... (only redeeming quality behind Fusion were the console contracts like I stated before)
And while looking back in the past, HSA was a failure too since no one else in the HSA Foundation adopted the standard aside from AMD so it was no where close to meeting it's intended goal. AMD took a radically different approach to heterogenous compute by putting GPUs on the back burner, participating less in standards like OpenCL and introducing ROCm to keep low maintenance. Ever since HSA was mostly dead, AMD changed a lot compared to their original vision and arguably now has more in common with Intel's strategy ... (less focus on GPU compute/more focus on the CPU)
Fusion was little more than a consumer brand for the HSA strategy at best, necessary since at the time the iGPUs were competitive whereas the CPUs weren't.
I disagree that AMD "took a radically different approach". What it did was out of financial and organisational necessity. The initial focus on CPU was natural since a competitive CPU offered the biggest TAM to get back into relevancy. ROCm was natural since AMD couldn't keep up with the competitors' software prowess so going all out open source while jump-starting their efforts promised the best possible outcome. And GPUs I don't think were that much on the backburner than everybody seems to think; tech was bound to be shared between CPUs and GPUs, but the former had to reach a certain level first to be usable for the latter (like for MCM in GPUs). And Fast Forward started back in late 2012 at which point Papermaster was there for a year and people like Su and Keller were hired during the year. The result of the planning back then is what we see now, and HSA most certainly was part of it.
I'm pretty sure their Fusion project existed before HSA
Also, ROCm might be open source but that doesn't mean it's a "community project" when it's largely developed behind closed doors at AMD with no outside contributors.
AMD just like Intel doesn't have the "GPU guru culture" going for them anymore so a lot of employees working at their GPU division eventually end up at Nvidia.
AMD regained their CPU guru culture over the course of their Zen project but it came at the cost of letting their GPU guru culture fall apart entirely
As long as AMD and Intel have nearly all of the CPU gurus in the world
no ARM vendor including Nvidia can come close to even replacing them since they can make all the rules they want in their favour like standardizing AVX-512 at their expense.
No amount of advantage in GPU compute is going to help Nvidia one bit because if AMD and Intel really wanted to they could collude together to block Nvidia GPUs to kill off their entire business
Fusion was/is HSA. It's what AMD called their integrated GPGPU initiative in early press releases. They later rebranded it to HSA when they dialed it back a little.
Just because AMD open sources something doesn't mean people have to/want to work on it. The option is there, which is the main thing.
. . . what?
I could argue against this, but I'd be going pretty far off-topic to dismantle this statement. Needless to say, you should probably rethink your position.
Unless NV does something that just ruins the ARM ecosystem for the HPC world, I would expect there to be more interest in SVE2 than AVX-512. When it comes to ML things get more complicated, but:
AMD relies almost exclusively on their GPU products for AI/ML
Intel is throwing spaghetti against the wall hoping for traction in the AI/ML market, including but not limited to: various AVX extensions (bfloat16 etc), two different AI company buyouts, FPGAs, and Xe.
In the end, Intel and AMD's best AI/ML products may well wind up being dGPUs where NV already has them beat (for now).
I hope you understand that's why NV is buying ARM Ltd.
HSA is a continuation of Fusion but the latter was definitely realized before the former was since Fusion started with pre-GCN APUs ...
What differentiates a vibrant open source project from other open source projects
It's true, you should look at the linkedin profiles of former AMD graphics employees and their most popular destination is working for Nvidia.
AMD had no grand vision for their GPU team to work on.
AMD has been trailing Nvidia harder now than they ever did before the start of the Zen project in terms of graphics technology.
ML is only a small fraction of the high-end compute market
Are the Grace CPUs even going to feature SVE/SVE2 ?
Full stop. You're missing the point.
ROCm = open source
CUDA = closed source
That is the primary differentiator. AMD would be pleased as punch if people made actual contributions to ROCm, but that's not really the point. Other hardware vendors are free to comply with ROCm fully if they so choose, versus CUDA where only nVidia hardware is supported natively. Not that, you know, anyone else has chosen to support ROCm that I know of.
OpenCL (also ope n source), for all its warts, is (mostly) supported by AMD; Intel; and NV hardware.
Meanwhile, AMD's dGPU products are positioned better against NV's than they have been in years. Remember Vega? You don't want them to go back to that do you?
Completely false. CDNA and RDNA2 are working quite nicely. CCIX is bringing AMD a step closer to the Fusion they had envisioned years ago. Yknow, before they had even produced Llano.
Really? REALLY???
Artificial Intelligence Market Size, Share, Growth Report 2030
The global artificial intelligence market size was estimated at USD 196.63 billion in 2023 and is projected to grow at a CAGR of 36.6% from 2024 to 2030www.grandviewresearch.com
Doesn't matter, since that isn't NV's priority. Point being:
AVX-512 isn't a significant threat to GPGPU-based anything (FP64, 16-bit ML, 8-bit ML)
AVX-512 isn't a significant threat to ARM vendors that do choose to support SVE/SVE2, such as Fujitsu
NV has a master plan of offering you a complete platform to host their expensive-and-oh-so-wonderful compute cards which is where they make all their money currently. They'll sell you the entire hardware AND software stack, top to bottom. Their compute cards are the stars of the show.
If that's what you think then I don't think you understand what open source truly entails ...
ROCm is ironically the polar opposite since it's a closed standard dictated purely by AMD
The Evergreen architecture and earlier GCN iterations were truly greatness at the time. Vega and RDNA or even RDNA2 shows that AMD are just a shadow of their past
"GPU guru culture"
Fusion was a dead end since the HSA Foundation fell apart.
AMD already had concept working but they had to massively readapt it to run on discrete CPUs and GPUs to make it more successful ...
Yes, really and Intel makes almost just as much money in a quarter as Nvidia does in it's entire year
The same could be said for the other way around ...
Is Nvidia's master plan is to offer a GPU computing platform ? Been there, done that before!