I'd agree here on the die sizes. Qualcomm allocated differently: 12x P-core, vs Apple's 4+4, and they squeezed a lot into a relatively tiny die.
Maybe Qualcomm is doing what
MediaTek 9300, Intel MTL, and
AMD's Zen4c do: only two P-cores get the full die size for peak frequency.
SXE (80W device type): just 2x P-cores can clock to 4.3 GHz, the other 10x can only hit 3.8 GHz?
My guess is that it is a quality yield (parametric) issue. All the cores in two clusters in particular (they said this indicating the third is different) are probably physically designed to hit 4.3GHz — maybe a bit more — but with yields you get variances, so only 1/4 in each is capable of 4.25GHz(4.3). Though it’s also possible one core in each batch is physically a bit bigger using more performant cells or whatever. They’ve done this with A7x cores on phones before so.
> “Meanwhile in lighter workloads, the chip supports turboing up to 4.3GHz on 2 cores. Qualcomm’s slide on this matter shows a core from each cluster, but it’s unclear whether this is some kind of prime/favored core in action (where only certain cores are designed/validated for those speeds) or if it’s simply a stylistic choice.“
So it could be either way — it could be two cores that are capable on a yield basis per each die — chosen cores out of firmware or whatever and different ones for each part — or they could each physically be distinct. But fwiw, they are probably not that much bigger if they are even different.
Unless Qualcomm is keeping a ton of clockspeed in the tank, 4.3GHz vs 3.8GHz isn’t that huge. These are still going to be relatively dense cores.
I also want to point out that L2 is a part of this, and these cores share L2. So if even one of the cores is clocked higher, any density savings in the cache would be destroyed by that because they need to be able to support the fastest core.
Most important: AMD targeting up to 5.7GHz on the same process, 5.2GHz on the mobile versions. When something does less it’s just yields. Zen 4C is capable of about low to mid 4GHz apparently fwiw. It’s a much bigger range of variation for AMD and they’re also pushing a much higher clockrange relative to the process. Qualcomm, even if they do have prime core designs which they don’t seem to disclose (leading me to believe this is just a yield thing and each die’s 4.3GHz cores will differ) are not going to save like 30-40% on area here.
I thought first, 600 MHz isn't much compared to AMD's Zen4c. Can Qualcomm really save much die space? But then MediaTek's design gave me pause: even if Arm's relatively small X4 cores can achieve a noticeable shrink with just a 400 MHz reduction, Oryon may well be the same.
They are absolutely smaller, but you have to separate the architectural and L2 difference from the physical design necessary to support x y z clocks.
Those smaller X4’s have 512kb of L2, not 1MB like the one huge X4 — Arm allows this as long as you have one huge X4. Second of all, those smaller X4’s that tag along have 2x128b vector/SIMD units instead of 4x128B with the regular X4. Another new Arm thing.
EDIT: I see MediaTek mentioned they’re laid out with larger silicon too.
Yeah fwiw I think you hit threshold effects for a given process node where you have to use larger silicon even if it’s still dense cells or whatever. Like as a probabilistic thing if you want most chips’ X cores to be able to hit some frequency, probably at some point you have to just change the phydes of the core(s) so a higher or very high proportion of them are able to do so.
It could also be about timing for this particular design too, idk.