It could be geekbench... or it could be the overhead from Intel's layout of the 4 CPU die and the fabric extensions between them. We know from previous generations that Intel's fabric for their Xeon scalable architecture imposes a non-trivial amount of overhead on MT performance. It may scale gracefully in some ways, but, there's still a hit to MT performance that exists. This isn't implying that AMD's method is perfect, but it seems to serve them well so far.
Another possibility is that, in single core situations, SPR can stretch it's legs a bit on that single core and run at a higher clock and power level than it can when all cores are active. We do note that the related P cores in ADL/RPL can consume quite a lot of power when left to run without limits. Zen4 doesn't seem to draw quite as much and I speculate that when all 96 cores are running on Genoa, they are able to maintain higher performance due to their efficiency.