We know that depending on the benchmark, performance gains from additional cores will produce varying results.
A hypothesis: If SMT is perfect, then the addition of SMT to a chip will result in the same performance as a chip with twice the cores, but no SMT. As a corollary, we can estimate the benefit of the addition of SMT vs the addition of real cores by comparing the increase in performance seen by each method of reaching a specific number of "expected" cores (whether via SMT or addition of physical cores).
There are minor variances in clock speed that can be accounted for, however we fortunately have excellent comparators for this hypothesis to be tested.
This Techpowerup benchmark run provides us with three results in particular that are of high utility. We have the 3600X (1 IOD + 1 chiplet with 2 disabled cores with SMT on), and the 3900X (1 IOD + 2 chiplets with a total of 4 disabled cores with SMT off and on). Since both CPUs will see the same constraints from an I/O standpoint, the main differences between the two, when comparing 6C/12T to 12C/12T would be 1) the inefficiencies of SMT vs more physical cores, 2) the inefficiencies of inter-chiplet communication vs on-chiplet SMT, and 3) the doubling of L3$ per thread on the 3900X SMT-off compared to 3600X. We can account for the 3900X's 0.2 GHz speed benefit by docking the 3900X result by 4.35%.
In these benchmarks we can compare SMT vs real cores (3600X SMT vs 3900X no SMT), and we can also see the benefit of doubling cores (3600X SMT vs 3900X SMT). I think both pieces of data would be interesting. We can also compare the benefit of adding SMT (3900X no SMT vs 3900X SMT) and compare that to the other results.
The results are interesting, data posted at the end (note that for perfect scalers I lowered the threshold from 80% to 75% but forgot to change the text in the graphic below). I'll let others digest them further. But here are my take-aways.
Perfect Scalers
After adjusting for clock speeds, there are several benchmarks which scaled almost perfectly up with core count, with 6c/12t -> 12c/24t scores increasing by at least 75% -- wPrime, CBR20 MT, Blender, Corona, Keyshot, 7-zip decompress.
On those tests:
The 3900X with SMT off (12 threads) had on average a 27.994% benefit in performance over a 3600X with SMT on (12 threads). The 3600X acts like a 3900X with SMT off that has 8.64 cores. This as a result estimates that a 3600X with SMT on performs about 44% better than would be expected with SMT off (8.64 core equivalents / 6 "real" cores = 44%).
In another test, enabling SMT on the 3900X resulted in, on average, a 45% improvement in performance, meaning a 3900X with SMT on acted like it had 17.4 real cores. However we know scaling wasn't perfect - on these tests, doubling cores only resulted in 84.795% increase in performance. If we normalize the SMT thread-doubling performance increase to account for the imperfect core-doubling performance increase, you get 45% / 84.8% = 53% performance increase.
All Pertinent Tests (excluding the VERY poor scalers)
We will exclude benchmarks that are poorly-threaded. If a benchmark sees <10% benefit from the doubling of cores (that is, comparing 3600X to 3900X, both with SMT on, performance difference is <10%), when we account for clock speed (4.35%) the difference falls too close to the margin of error for the test. Further, it means the test is likely designed for single or few cores/threads, poorly scalable to multiple cores/threads, constrained by non-processor limits, or some other limitation. In other words, they introduce unnecessary noise to the data or are just poor tests for multi-core performance. This removes SuperPi, CBR20 ST, Octane, Kraken, WebXPRT, Word, PowerPoint, Excel, Photoshop, Premier, Zephyr, VMWare, VeraCrypt, and Lame. Sorry, but in all but one of those, the 3900X when normalized for clocks performs WORSE than a 3600X with fewer cores, which is absurd and makes them functionally useless for our purposes.
12c/24t benefit over 6c/12t (doubling cores and threads) - average 52.9%
12c/12t benefit over 6c/12t (achieving 12 threads via SMT or via real cores) - real cores on average 23.85% better (3600X acts like a 3900X/SMT off that only has 9.14 cores. This would estimate a 52% performance increase for SMT.
SMT on vs SMT off on 3900X results in 23.44% increase in performance, normalized to the fact that these are not well-scaling tests overall, divide 23.44% / 52.9% = 44.3% benefit of SMT.
All Tests, even the ones that don't scale well
Even if we INCLUDE the poor scalers, SMT benefit is 42.644% (3900X SMT on vs 3900X SMT off), when normalized for overall benefit of adding real cores (11.968% benefit of 12c/24t over 12c/12t, divided by 28.065% benefit of 12c/24t over 6c/12t).
Conclusion
Overall it appears we should expect a 42 to 53% performance increase when SMT is enabled vs not. That is, if a chip has performance of 1.00 with SMT off, it would have performance of 1.42 - 1.53 with SMT on. What's amazing to me is that however you slice it, SMT produces a fairly tight window of improvement in performance. This is in accordance with the scientific literature, for example
this paper citing 30-70% increase.
What I Really Want
While I think this is pretty clear and is consistent with the literature, I still want more data, because... I'm bored due to social distancing and shelter-in-place orders. I want benchmarks with 3600X matched to 3900X (6 cores per chiplet), 3700X matched to 3950X (8 cores per chiplet), and also include 3960X, 3970X, and 3990X --- all with SMT on and off. Heck, give me a 7742 2P with SMT on and off! This would allow an incredibly comprehensive evaluation of SMT implementation/benefit on AMD's Zen2 chiplet. And I'd love to compare it to Intel's implementation as well. And even compare it to Zen and Zen+. Give me all the data!
Edit: Interesting points after looking at it some more - 3:11 PM CST
Several tests when going from 6c/12t to 12c/24t had a double digit benefit, but when going from 12c/12t to 12c/24t had a far lower benefit. Such tests include UE4, VS C++, Euler3D, DigiCortex, and x265. These application-specific limits confound the data. We would need SMT on/off tests on 3600X, 3700X, and ideally the HEDT chips to replicate this and confirm that those tests would be poor benchmarks to use in future analysis of high-core-count processors. As it stands, I'm not sure why there would be such small benefits unless the tests are heavily skewed against SMT and benefit from physical cores (do they saturate the front-end even with only 1 thread per core? are they just very efficient tests in that they produce almost no pipeline downtime, such that there is no real place for a second thread to operate?). I would like to confirm this, not sure if anyone else could comment on it.