Perhaps IDC can provide some clarity, but different designs do achieve different densities. If I remember correctly, cache is very un-dense, as well as that apparently unused space below the igpu as well.
xtor density will scale with target clockspeed within a given node. The denser the xtors, the slower the circuits they tend to form, the lower the overall clockspeed of the circuit.
You see this across logic, very dense gpu circuits targeting rather low clockspeeds (comparatively, versus mpu circuits built on the same node).
You also see this in cache, L1$ will be very low xtor density versus L3$ but it will also be clocked at the same clockspeed as the logic circuits and with very low latency. L3$ will not be clocked as high, but it will be much more dense.
This is a direct result of the drive current, which within a given process node is a function of operating voltage and xtor width. The wider the xtor the lower its effective density but the higher its drive-current. Make the xtor too wide and your circuit speed decreases because of wire-delay across the circuit, but your power-consumption climbs because of all that current flow. Make your xtor too narrow and your circuit speed decreases because the drive current decreases, but so too does power consumption.
It is all a delicate balancing act, but at the device physics level when we develop the xtors themselves we target specific clockspeeds (specific Idrives) and then push the dimensions of the sram cell such that we hit those targets while still producing a manufacturable sram cell that is reliable for the device's operating lifetime.
On a given node, just giving arbitrary numbers here but the ratios will be about right - we might design an sram cell for 800MHz operation in mobile devices which has a 0.1um^2 footprint but to scale that clockspeed to 4GHz the sram footprint (using the same xtors) has to balloon to 0.13 or 0.15um^2.