but is there any benchmarks out their measuring thread and context switching overhead?
I/O workloads inside VMs would expose the higher context switching latency.
I've always wanted to do the following test (because I know it will have very interesting results) but never get the time to do it because VMs can be a bit slow and need a lot of free storage space:
Install HammerDB on your PC. Enable two P-cores and two E-cores only. Run the benchmark with whatever settings you like. Note the score (in Transactions per second or orders per minute etc.)
Now enable one more P-core so the OS has a core available for its tasks and so the VM hypervisor will let you assign the remaining two P-cores and two E-cores to the VM. Install HammerDB in the guest OS (should ideally be the same as the host OS). Now run the same benchmark with the same settings. Ideally, the difference should be about 90% of the native speed. But you may find out that the performance hit is way too big. That is the context switching overhead because every time that benchmark has to do I/O, the hypervisor goes from user mode to kernel mode for the system calls for DB writes and switching from user mode to kernel mode is the actual context switch overhead.
Repeat the same experiment with 4 P-cores with no E-cores on host OS and 5 P-cores/0 E-cores inside VM and that will give you the impact of the slowness resulting from the involvement of the E-cores.