SPR still seems to be selling decent and this is SPR with +20% 1T perf and 17% better perf/watt.
I think this is "selling" the EMR in the wrong direction. As "peak" performance per socket/watt whatever product SPR was never going to win anything and EMR that is mostly the same won't change anything.
SPRs main weakness is the fact that it was not really performing any good in benchmarks where it should shine -> like DBMS and similar ones.
The reason was simple -> uncore and submemory system was designed by incompetent monkeys who came up with design that had worse of all worlds. Where AMD had 8 core CCD cluster with great cache, these guys managed to create a CPU where compute unit was 1 core and next stop was L3 core of 50ns latency and then memory with hilariously variable but uniformly horrible latency when in non "quadrant" mode. Bad BW, bad latency, no L3 capacity to speak of.
Disaster on silicon if you like, combining worst qualities of chiplet design with those hilariuos ARM "our 96C are backed by industry leading mesh ( find us at nearest cloud ) and 16MB of L3". Resulting in a chip that had no redeeming qualities and despite having like 20-30% more IPC vs IceLake stuff, managed to make no improvement "per core" in quite some of those workloads.
EMR seems to be targetting those exact weaknesses ( compare with doubling down on retarded ways of say add more chiplets to SPR and coming up with even worse chip that pleases marketing with more cores but is useless otherwise):
1) We get sensible two chiplet design
2) We get proper amounts of L3 cache that directly address weak spot. Said L3 cache despite being like 3x larger is actually faster than SPR ( some of it is natural due to (1), but Intel poured effort to make it perform for sure).
3) Memory has nice latency. Again some of it due to memory speeds rising and helped by non-monkey-with-engineer-degree designed mesh, L3 cache, but again Intel is very directly improving their weakest spot. Remember, they HAVE IMC on those chiplets, they should beat AMD that has to go to IOD in efficiency and latency no problem, was not the case before.
So even if process, clocks, architecture are the same, ST perf and performance increases nicely due to not having to hit memory @150ns on each L2 miss resulting in a chip that if priced right will have a niche for itself.
I think best way to sum EMR => it is on servers what Raptor Lake was to Alder Lake on desktop and mobile. Directly addressing weak spots unleashed performance was that depressed by those weak spots. More L2, tighter ring and L3 cache and 5600 DDR5 speeds added disproportional amount of perf. Same is bound to happen with EMR.