Apparently Win11 plays a BIG role here... but I still make assumption like before, very likely the small cores didn't work well if you compare Sandra to other tests...
It could be as simple as Sandra having the type of workload called - equal parcels: It takes number of threads @ 24 ( for 8+HT +8 small) and runs an equally sized computation on each. It works fine on homogenous architectures, where each thread has same performance, but fails utterly in hybrid architecture.
On hybrid, big cores complete the workload first and what happens next depends on OS:
1) On hybrid unaware OS like W10 => workload is left to keep on running on small cores, while big core is sat Idle => this decision looks stupid, but in fact it is optimal on previuos architectures, to stick workload whereever it is now to reap benefits from hot caches, hot branch predictors and so on.
2) On hybrid aware OS like W11, CPU is hinting OS that it currently small core is running heavy workload that would be better fit on big core and it gets rescheduled there and total "performance" increases cause workload completes sooner.
Of course due to inflexible workload, even (2) approach idles the small cores during that final strech of runtime and total potential does not reach 100% like it does in things like CB.