Speculation aside, what the heck is the fascination with SMT4? Because it really hasn't been done before (or at all in x86-land)? If AMD saw a reason for it, I'm sure they'd be working on it. With the massive amount of real cores available these days, I really question how much doubling threads would help outside of very specific use cases.
As others noted, it has been done before. I am not sure what the first implementation was. There was the UltraSPARC T1 all of the way back in 2005. That one was 279 million transistors on a 90 nm process with an area of 378 mm2 (from Wikipedia) for up to eIght 4 thread cores. There was some weirder cpu designs even earlier like the Cray MTA processors (128 hardware threads per core). The IBM power 9 does SMT4 and SMT8 now.
The early multithreaded processors really didn’t have the resources to run both single thread and multithreaded code well. The T1 was up to 8 cores with only 279 million transistors. Modern processors have billions of transistors to play with. Most people I know turned HT off on intel processors up until intel made a bunch of improvements in Haswell or Broadwell; I forget which. Even after Broadwell, it often doesn’t increase performance at all for HPC and workstation applications.
I don’t really expect AMD to do SMT4 in Zen 3. It does make some sense though. They already have a good SMT2 implementation, so extending that to 4 threads could be plausible design-wise. Also, there doesn’t seem to be a core count increase with Zen 3, so that could make up for that. It does make sense for some server applications and AMD is server first these days.
I don’t think tackling SMT4 with Zen 3 would have been their best course of action. It isn’t useful outside of niche (but very profitable) applications. We pretty much know that Zen 3 is a massive rework of the cache architecture. The CCX is going 8 core and 32 MB of L3, which is a massive change by itself. I think they are also going to increase cache bandwidth significantly to support increased floating point throughout. We pretty much already know that also, we Just don’t know how much improvement there will be. There is a lot of discussion about SMT4 because it is still a bit of an unknown. With all of the other somewhat known changes, it just seems like too much to also add SMT4. We don’t really need to go up to more threads. Perhaps when they switch to DDR5 and PCi-e 5.0, they will also increase the thread count. That may also be the time to switch the IO die to an interposer with massive L4 cache. They are still at 8 channel memory for up to 128 threads now. They are also in the strange situation where they have more IO bandwidth than memory bandwidth. That will be great for supercomputer designs; massive overkill for just about anything else. I think they will want more memory bandwidth before they increase the thread count. A super large L3 cache variant will help with that. I think the applications where Intel wins will be almost non-existent with Zen 3 unless intel really pulls a hat out of a rabbit soon.