To make SMT4 and SMT8 actually work took a *LOT* of transistors. As I recall (And someone will likely correct me, hopefully kindly if I'm wrong) it was another %35 per core. Far above the budget for Intel and AMD. When SMT4 went live as it were it made sense. And probably still does for IBM's market. Having a gazillion threads makes no sense if you can't feed them.I've said it before and I'll say it again - SMT4 is nothing special.
POWER7 had it in 2010 and SPARC T3 had SMT8 in the same year.
If it was such an obvious 'low hanging fruit' gain, then everybody would be doing it rather than the now more obscure POWER and SPARC options.
If it was obvious then you can bet Intel who introduced x86 SMT would have done it by now - they've had more than enough opportunity while they were maintaining a more competitive cadence before 10nm woes set in.
You fail to acknowledge that engineering a core for maximum thread count could compromise its single thread performance - a direction AMD has tested before and barely lived to regret with Bulldozer before Zen.
At this point their concentration is on ST performance and core counts - those 2 things alone provide a steady improvement to MT performance per generation.
SMT2? Possible, unlikely but possible. SMT4? Possible, but then again I might win the lottery 2 times. This week... As process nodes have shrunk, it may be better on the server side of things to just add more cores.