- Nov 14, 2011
- 10,142
- 4,738
- 136
Based on Neoverse cores, not a Denver derivative.
This makes sense with the ARM acquisition attempt. They want to fully control the server stack.
Just to add...Fusion lives on in CCIX. The name is gone, but the original concepts are fulfilled. To be specific, they're finally achieving something similar to Torrenza
I've come to the conclusion you just like spouting a lot of hot air.ROCm is ironically the polar opposite since it's a closed standard dictated purely by AMD
I've come to the conclusion you just like spouting a lot of hot air.
CUDA is often underrated since contrary to the common impression CUDA isn't just a programming framework for GPUs but a whole set of tools and integrated ecosystem that allows its seamless productive use. And that's also both AMD's weakness in software and goal with ROCm, with the big difference that where Nvidia has proprietary closed source solutions AMD often used and adapted existing open source efforts and doesn't try to lock its users to "one true" approach, instead trying to support different existing approaches:
*snip*
AMD's big CUDA (as in the programming framework) "replacement" within ROCm is HIP which is in a way little more than a translator (subset of CUDA in fact) that allows the result to be portable between GPUs by Nvidia (CUDA 4.0+) and by AMD without changes to the source code.
Whatever leadership AMD had in GPUs was truly gone by that point which is why I maintain that they don't have a GPU guru culture anymore
I'm pretty sure Xeon Phi got killed off by regular CPUs and GPGPU can't run standard C++ code either.
Xeon phi did not win all that much compared to regular CPUs
since there was kernel launch overhead
They were better off just implementing AVX-512 straight into the CPUs.
CPU guru culture
GPU compute platforms have existed ever since CUDA created so adding crappy CPUs to it won't make their entire stack more compelling than it already is ...
Also Xilinx is working to integrate the FPGA compute infra within ROCm. Xilinx FPGAs uses CCIX for coherence
You can read it herePretty sure this didn't happen until after AMD bought them?
The technology demonstration showcases:
- Unified discovery and reservation of AMD and Xilinx accelerators using a converged runtime in the AMD ROCm open software platform;
- Dispatch of work to Alveo accelerators using the same user-space queues used for low-latency work dispatch to AMD Instinct accelerators;
- Peer-to-peer synchronization between GPU and FPGA devices; and
- Access to memory on GPU, CPU, and FPGA devices using a common, shared virtual address space
Seems to me they don't need it. Ultimately ROCm/HIP are fulfilling a vision they had for their own product stack 15 years ago. Their only fault was in letting NV get there first with CUDA.
Not true. Phi was killed by:
1). Intel's failed 10nm process leading to the cancellation of Knight's Mill (which is being replaced by Xe)
2). NV's HPC-oriented dGPUs
Also if you thought you could just "run standard C++ code" on earlier Phi products . . . it wasn't quite that simple. Knight's Landing was I think the first (and last) Phi product that made coding for it about as easy as writing code for any old Xeon. Assuming you could hand-tune AVX-512 but whatever. In any case, Phi competed directly against other HPC hardware, which at the time was (and still is) dGPUs. There is no "general purpose" CPU, x86 or otherwise, that pushed Phi out of its niche.
See Tianhe-2. You must concede that Phi products could produce greater throughput/watt in appropriate workloads than any of Intel's standard Xeon products of the same generation. Phi was eventually replaced by Cascade Lake-AP, and you ought to know how popular THAT was.
I thought that was eliminated by Knights Landing?
Intel had every intention of continuing Phi with Knights Mill despite implementing AVX-512 in standard Xeons as far back as Skylake-SP.
Phi was mostly eliminated by general purpose CPUs. CPUs now have core counts going upto 64 cores and EPYC Genoa will have upto 96 cores with AVX-512 to boot ? Phi also made programming more complex too as kernels needed to be separated by host or device for execution and the latency overhead of splitting kernels between host/device was never solved. Instead general purpose CPUs prevailed since there was no significant performance benefit to the Xeon Phi and regular CPUs were easier to program as well when programmers didn't have to deal with device specific code nonsense ...
Knights Landing solved that problem. There was no "host", the Phi did everything. Knights Mill was going to be the same, until it got killed.
I'm pretty sure the Intel compiler have explicit extensions for 'offloading' so it's not as transparent as you believe it to be. Automatic offloading is only available if you're using Intel's MKL library ...
Knights Landing only solves the compatibility issues with other Xeon processors since Knights Corner had a pretty different x86 ISA implementation ...
The offloading was needed for devices on PCIe. Knights Landing as a host could just use AVX-512 instructions inside any arbitrary code sequence.
Which is exactly why they had a radical shift in their strategy to be closer with Intel's and that's perfectly fine ...
Also AMD's original vision for HSA was for it to become a new industry standard
Phi was mostly eliminated by general purpose CPUs.
CPUs now have core counts going upto 64 cores and EPYC Genoa will have upto 96 cores
Yeah but using Knights Landing as a pure host defeats the intended concept behind Phi which was supposed to be a co-processor or a generic accelerator.
The history is interesting, however there's an excellent chance this will all come to naught.
With the current situation vis-a-vis silicon, and regardless of the fact that ARM doesn't actually manufacture anything, I would not bet on NV getting the go-ahead from at least the UK for aquisition.
And I wouldn't be surprised in the least if China blocks it just to spite the US.
Smoke and mirrors. If you actually followed the hardware behind HSA you'd see that the HSA Foundation was populated by AMD and AMD alone. They were the only vendor to produce hardware that was HSA-compliant. It's no different with ROCm. AMD may have gotten some other code contributors but it did them very little good. Again, did you ever use the HSA software stack with Kaveri or Carrizo?
I don't know why you persist in this fantasy. Phi-based supercomputers like Tianhe-2 were supplanted on the Top 500 by systems like Summit that use NV dGPUs. Phi was effectively dead in 2017 when Intel's 10nm was completely botched. The last Phi product to reach the market - Knight's Landing - was launched in Q2 2016.
You don't think that people build HPC/ML training machines solo around those, do you?
If that's what you think about Phi, then maybe you don't understand the significance of Grace either. If NV could boot a system to a functioning OS entirely using only their own dGPUs without a motherboard + chipset and CPU, they most certainly would.
Chips using these three IPs; Cortex-A73, Mali-G71, and CoreLink CCI-550; are HSA 1.1 compliant.They were the only vendor to produce hardware that was HSA-compliant.
For reference:Let see how they stack up to Sapphire Rapids and Genoa
![]()
NVIDIA Unleashes Grace Hopper & Grace CPU Superchips: 144 Core CPU With Up To 600 GB Memory, 2x Perf/Watt Versus Traditional Servers
NVIDIA has unleashed its SUPERCHIPS platform featuring the Grace Hopper & Grace CPU platforms for large scale AI and HPC workloads.wccftech.com
For reference:
![]()
144 ARM cores producing 740 SPECint2017... that's about 5.14 points per core.
2x64 Milan gets 512 points, or 4 points per core.
Genoa likely gets a >20% IPC bump along with some clock increases, so I think it's totally feasible for AMD to get >920 SPECint2017 out of a 2-socket 96-core Genoa server. Bergamo is supposed to do 2x the throughput of top Milan per socket, so a 2-socket 128-core Bergamo should do about 1024 points.
Ehhh, just post their plots next time lol. Their articles spend 2x the amount of words needed to get to the point for ad reasons.I got this out of WCCFTECH
![]()
NVIDIA Grace 144 Core ARM CPU Is 14% Slower Than Dual 128 Core AMD EPYC 7763 CPUs In Spec Integer Benchmark
NVIDIA's recently announced Grace SUPERCHIP CPU platform which features 144 ARM cores has been compared to AMD's EPYC CPUs.wccftech.com
You cannot compare scores just like that, it is very easy to find SPECint2017 score where Milan hits far higher, closer to 1000144 ARM cores producing 740 SPECint2017... that's about 5.14 points per core.
2x64 Milan gets 512 points, or 4 points per core.
Hardware Vendor | System | Peak Result | Base Result | Energy Peak Result | Energy Base Result | # Cores | # Chips | Published | Disclosure |
---|---|---|---|---|---|---|---|---|---|
ASUSTeK Computer Inc. | ASUS RS720A-E11(KMPP-D32) Server System 2.45 GHz, AMD EPYC 7763 | 913 | 861 | -- | -- | 128 | 2 | Dec-2021 | HTML CSV PDF PS Text Config |
ASUSTeK Computer Inc. | ASUS RS720A-E11(KMPP-D32) Server System 2.45 GHz, AMD EPYC 7763 | 892 | 839 | -- | -- | 128 | 2 | Mar-2021 | HTML CSV PDF PS Text Config |
Cisco Systems | Cisco UCS C225 M6 (AMD EPYC 7763 64-Core, Processor) | 898 | 851 | -- | -- | 128 | 2 | Sep-2021 | HTML CSV PDF PS Text Config |
Cisco Systems | Cisco UCS C245 M6 (AMD EPYC 7763 64-Core Processor) | 898 | 854 | -- | -- | 128 | 2 | Jul-2021 | HTML CSV PDF PS Text Config |
Cisco Systems | Cisco UCS C245 M6 (AMD EPYC 7763 64-Core Processor) | 892 | 850 | -- | -- | 128 | 2 | Jun-2021 | HTML CSV PDF PS Text Config |
Cisco Systems | Cisco UCS C245 M6 (AMD EPYC 7763 64-Core Processor) | -- | 852 | -- | -- | 128 | 2 | Jun-2021 | HTML CSV PDF PS Text Config |
Dell Inc. | PowerEdge C6525 (AMD EPYC 7763 64-Core Processor) | 848 | 800 | -- | -- | 128 | 2 | May-2021 | HTML CSV PDF PS Text Config |
Dell Inc. | PowerEdge C6525 (AMD EPYC 7763 64-Core Processor) | 835 | 790 | -- | -- | 128 | 2 | Mar-2021 | HTML CSV PDF PS Text Config |
Dell Inc. | PowerEdge R6525 (AMD EPYC 7763 64-Core Processor) | 872 | 822 | -- | -- | 128 | 2 | Jun-2021 | HTML CSV PDF PS Text Config |
Dell Inc. | PowerEdge R6525 (AMD EPYC 7763 64-Core Processor) | 845 | 801 | -- | -- | 128 | 2 | Mar-2021 | HTML CSV PDF PS Text Config |
Dell Inc. | PowerEdge R7525 (AMD EPYC 7763 64-Core Processor) | 872 | 821 | -- | -- | 128 | 2 | May-2021 | HTML CSV PDF PS Text Config |
Dell Inc. | PowerEdge R7525 (AMD EPYC 7763 64-Core Processor) | 853 | 802 | -- | -- | 128 | 2 | Apr-2021 | HTML CSV PDF PS Text Config |
Dell Inc. | PowerEdge R7525 (AMD EPYC 7763 64-Core Processor) | 846 | 798 | -- | -- | 128 | 2 | Mar-2021 | HTML CSV PDF PS Text Config |
Fujitsu | PRIMERGY RX2450 M1, AMD EPYC 7763 2.45 GHz | -- | 824 | -- | -- | 128 | 2 | Oct-2021 | HTML CSV PDF PS Text Config |
GIGA-BYTE TECHNOLOGY CO., LTD. | R282-Z90 (AMD EPYC 7763 , 2.45GHz) | 866 | 813 | -- | -- | 128 | 2 | Mar-2021 | HTML CSV PDF PS Text Config |
GIGA-BYTE TECHNOLOGY CO., LTD. | R282-Z90 (AMD EPYC 7763, 2.45GHz) | 884 | 832 | -- | -- | 128 | 2 | Jul-2021 | HTML CSV PDF PS Text Config |
Hewlett Packard Enterprise | ProLiant DL365 Gen10 Plus (2.45 GHz, AMD EPYC 7763) | 865 | 813 | -- | -- | 128 | 2 | May-2021 | HTML CSV PDF PS Text Config |
Hewlett Packard Enterprise | ProLiant DL385 Gen10 Plus v2 (2.45 GHz, AMD EPYC 7763) | 872 | 821 | -- | -- | 128 | 2 | Mar-2021 | HTML CSV PDF PS Text Config |
Lenovo Global Technology | ThinkSystem SR645 2.45 GHz, AMD EPYC 7763 | 874 | 819 | -- | -- | 128 | 2 | Mar-2021 | HTML CSV PDF PS Text Config |
Lenovo Global Technology | ThinkSystem SR645 2.45 GHz, AMD EPYC 7763 | 870 | 819 | -- | -- | 128 | 2 | Mar-2021 | HTML CSV PDF PS Text Config |
Thanks for the heads up. I thought that AT's own internal estimates would be good enough; looks like I was wrong.You cannot compare scores just like that, it is very easy to find SPECint2017 score where Milan hits far higher, closer to 1000
All of AT internal SPEC numbers are "Estimates" as they are never verified by the sourceThanks for the heads up. I thought that AT's own internal estimates would be good enough; looks like I was wrong.
Its kind of sad that it gets beat by 33% by a CPU thats been out for over a year. (by the time it launches) By the time Genoa comes out (close to its launch) I am sure it will get beat by 100%Thanks for the heads up. I thought that AT's own internal estimates would be good enough; looks like I was wrong.
I understand that. I just didn't expect the estimate to be so far off the tested/true value, and definitely not when that estimate game from AT.All of AT internal SPEC numbers are "Estimates" as they are never verified by the source
It is not sad, that chip was made specially for AI and HPC where memory bandwidth is more important than int throughput.Its kind of sad that it gets beat by 33% by a CPU thats been out for over a year. (by the time it launches) By the time Genoa comes out (close to its launch) I am sure it will get beat by 100%
Maybe but only in this benchmark. The main advantage being the tight integration between CPU and GPU over a super fast bus. this thing matters for "AI" or other compute stuff. Albeit in a mixed workload single-threaded CPU matters also greatly and there ARM certainly comes up with the short stick.By the time Genoa comes out (close to its launch) I am sure it will get beat by 100%