You absolutely can not judge what the possible ST throughput of a given core would be if the SMT transistor count cost was dedicated to ST throughput by turning off SMT of the existing core.
You absolutely 100% can do that for each and every piece of coding you write or just run.
Intel PCM will show you how many instructions per cycle a piece of software will use and there is no way for it to use any more than that unless you come up with a radical new way of coding that piece of code.
(And this is just an example other coding tools allow for a much deeper analysis)
The Intel® Performance Counter Monitor provides sample C++ routines and utilities to estimate the internal resource utilization of the latest Intel® Xeon® and Core™ processors and gain a significant performance boost.
All the CPU makers know how much IPC general code can use and make their CPU cores accordingly.
That's why servers have four way SMT because they use way more much "narrower" threads and desktops only use two way SMT because we use heavier threads in general.
Doesn't SMT help single threaded performance indirectly by masking memory latency as well? I read that somewhere a long time ago, that if a core is processing a thread and it stalls for whatever reason, the other thread can continue the process.
Is that true?
You are already talking about at least two threads, or the same thread cloned on two hardware threads, so you answered your own question.
HT brings 33% more performance in Raptor lake CPUs. There is A LOT of resources used for that and potentially big opportunity to improve ST performance when abandoning hyperthreading. There is no point for it in consumer computers any more, now that we have a lot of cores available and even more smaller cores.
HT brings 33% on average but also much more or much less depending on what you run, and that is because it's not using a lot of resources but because a lot of threads leave a lot of resources unused...
HT uses the UNUSED resources that the first thread leaves untouched, and it leaves them untouched because there is no way for that thread to use them, making the core have even more resources available for the first thread is not going to make that thread use more resources it will just leave even more resources untouched.
We know this stuff for 20 years already.