Whatever leadership AMD had in GPUs was truly gone by that point which is why I maintain that they don't have a GPU guru culture anymore
Seems to me they don't need it. Ultimately ROCm/HIP are fulfilling a vision they had for their own product stack 15 years ago. Their only fault was in letting NV get there first with CUDA.
I'm pretty sure Xeon Phi got killed off by regular CPUs and GPGPU can't run standard C++ code either.
Not true. Phi was killed by:
1). Intel's failed 10nm process leading to the cancellation of Knight's Mill (which is being replaced by Xe)
2). NV's HPC-oriented dGPUs
Also if you thought you could just "run standard C++ code" on earlier Phi products . . . it wasn't quite that simple. Knight's Landing was I think the first (and last) Phi product that made coding for it about as easy as writing code for any old Xeon. Assuming you could hand-tune AVX-512 but whatever. In any case, Phi competed directly against other HPC hardware, which at the time was (and still is) dGPUs. There is no "general purpose" CPU, x86 or otherwise, that pushed Phi out of its niche.
Xeon phi did not win all that much compared to regular CPUs
See Tianhe-2. You must concede that Phi products could produce greater throughput/watt in appropriate workloads than any of Intel's standard Xeon products of the same generation. Phi was eventually replaced by Cascade Lake-AP, and you ought to know how popular THAT was.
since there was kernel launch overhead
I thought that was eliminated by Knights Landing?
They were better off just implementing AVX-512 straight into the CPUs.
Intel had every intention of continuing Phi with Knights Mill despite implementing AVX-512 in standard Xeons as far back as Skylake-SP.
Now you're just making things up.
GPU compute platforms have existed ever since CUDA created so adding crappy CPUs to it won't make their entire stack more compelling than it already is ...
Simply not true. NVidia has always relied on someone else to supply CPUs and chipsets to host their devices, even when they were able to supply proprietary interconnects for their high-end stuff. There is the very real threat that Intel and AMD will simply kick them off their systems altogether. Drafting their own platform design with their own CPUs and chipsets offers a safe haven to their precious GPGPU business.
Also Xilinx is working to integrate the FPGA compute infra within ROCm. Xilinx FPGAs uses CCIX for coherence
Pretty sure this didn't happen until after AMD bought them?