nicalandia
Diamond Member
In Any Event, Genoa's NDA ends in just a few days STH will be releasing Benchmarks on that day and poor Ice Lake will have to pay the price of Intel Delays and will be the Victim of Genoa's Might.
Is EMR confirmed to be even bigger than SPR? What would make its development go so much smoother then for both to converge? A lack of SDSi? And what would be each others' unique selling point if EMR is not replacing SPR outright?Rumor has it that EMRs might be scheduled in early 2023.
Now, I'm putting a simple interpretation on SPRs/EMRs.
SPRs : low-end ( 600~700sqmm die area?).
EMRs : mid-range ( ~800sqmm die area).
SPRs with EMIB: high-end ( ~1600sqmm die area ).
EMRs with EMIB: extream ( >=1600sqmm die area).
3D Stack Cache and HBM2e are different approaches that achieve similar results, AMD does not need to reach more than 96MiB per CCD to stack well to Intel's 64GiB of on package HBM2eI'm interested to see if AMD will go for higher Vcache stacks on Genoa as compared to MilanX to better compete against HBM emerald rapids. We saw that there was uefi support for 4 hi stacks early on in Milan development. If AMD wanted, its possible that we could see 4hi genoa 96core products with 288MB of L3 per CCD. That would be a very interesting comparison to a 2P EMR-HBM system.
Sapphire Rapids-SP will support up to 8 SocketsSPR for 4s/8s ?
EMR for 2s ?
SPR for HEDT?
While their effect may overlap some, the cause is distinctly different. HBM2e is, as its name points out, all about bandwidth and throughput. L3$ extended with V-Cache on the other hand is all about low latency. HBM is known to have similar to slightly worse latency than standard DRAM memory. So in latency sensitive workloads V-Cache expands the data able to reside in L3$ with its significantly lower latency before having to access much higher latency memory. HBM's advantage on the other hand is that once data is being accessed it pushes all data at much higher speed that memory will ever be able to do.3D Stack Cache and HBM2e are different approaches that achieve similar results
Yeah, but kickbacks are a zeroth order factor. 😛If Cost of Power is a first order of factor, shouldn't customers be buying the other guys?
It clearly exists to feed AMX for AI workloads. Maybe HPC as well, but yeah, HBM only makes sense with heavy vector/matrix compute.While their effect may overlap some, the cause is distinctly different. HBM2e is, as its name points out, all about bandwidth and throughput. L3$ extended with V-Cache on the other hand is all about low latency. HBM is known to have similar to slightly worse latency than standard DRAM memory. So in latency sensitive workloads V-Cache expands the data able to reside in L3$ with its significantly lower latency before having to access much higher latency memory. HBM's advantage on the other hand is that once data is being accessed it pushes all data at much higher speed that memory will ever be able to do.
Typically CPU workloads are considered latency sensitive whereas GPU workloads are considered bandwidth sensitive. This is why HBM has been mostly used on products of the latter kind.
I'd expect workloads that use heavy vector computation on big data to profit the most of SPR-HBM/EMT-HBM, but others may know better and be able to offer more insight into its potentials.
If I had the $$, I would get one of those and Run the OS on HBM2e alone, no DDR5, OC 8 Cores and see how games perform..!It clearly exists to feed AMX for AI workloads. Maybe HPC as well, but yeah, HBM only makes sense with heavy vector/matrix compute.
Imagine the lead architect of that CPU facepalming on reading that 😀If I had the $$, I would get one of those and Run the OS on HBM2e alone, no DDR5, OC 8 Cores and see how games perform..!
Yes, it's not for game and yes it will cost about 10K per CPU(non-HBM will be like 5K a pop), but I also would like to get my hands on the Milan-X with 1GiB of L3$ and run a OS on Cache(probably small tiny Linux)......Imagine the lead architect of that CPU facepalming on reading that 😀
Bummer.No you can't run Linux on stacked L3$
Indeed. All Zen chips are affected by this btw. Has to do with the chip itself (or rather its PSP) using cache for booting the secure firmware from cache first essentially as POST.Bummer.
To not derail this thread, could you post a reply explaining what affect this has in one of the Zen threads ? I want to know why all my EPYC and most of my Zen2 and Zen3 and Zen 4 boxes all run linux just fine. (and are faster than windows for what I do)Indeed. All Zen chips are affected by this btw. Has to do with the chip itself (or rather its PSP) using cache for booting the secure firmware from cache first essentially as POST.
Cache as RAM (CAR) is the name as used by Coreboot. Would not be that useful in the end considering most hardware uses DMA which again accesses actual RAM and doesn't work if that doesn't exist.
Anyway Coreboot documents the behavior of Zen processors:
"Unlike any other x86 device in coreboot, a Picasso system has DRAM online prior to the first instruction fetch.
Cache-as-RAM (CAR) is no longer a supportable feature in AMD hardware."
This all has zero bearing on normal usage, and isn't specific to Linux either. All this is about is: With caches getting bigger and bigger an OS could theoretically reside completely in that cache and boot without the system having any DRAM installed. With Zen this is not possible due to the aforementioned changes in boot behavior. Intel chips on the other hand afaik allow this to this day. That's all there is to it.To not derail this thread, could you post a reply explaining what affect this has in one of the Zen threads ? I want to know why all my EPYC and most of my Zen2 and Zen3 and Zen 4 boxes all run linux just fine. (and are faster than windows for what I do)
Is there any small DOS type shell that can boot without RAM and reside entirely in cache on an Intel CPU?Intel chips on the other hand afaik allow this to this day. That's all there is to it.
Outside of Coreboot itself not that I'm aware of currently.Is there any small DOS type shell that can boot without RAM and reside entirely in cache on an Intel CPU?
The Xeon W9 3495 have been confirmed by many source to be a 56C/1125
www.sisoftware.co.uk
The Xeon W9 3495 have been confirmed by many source to be a 56C/1125
So the 3495 non-X will likely be nearly twice as fast as the 3495X? Such weird nomenclature...!Ah, but there's a difference. One is the 3495... the other is the 3495X. 😉