Not sure why people are insistent on seeing SMT4. The difficulties in making SMT work is in validation. With SMT4, it requires even more work while providing diminishing returns.
We're starting to get enough cores which also reduce the effectiveness of SMT even for multi-threaded applications. We know gains with extra logical threads are only true when the code can benefit from more threads than there are physical cores. This is the same reason why SMT for Intel dual cores were great on gaming, but reduced performance on the quad cores.
The difference with IBM's POWER chips are that they put in more than bare minimum necessary to specifically better gains for SMT. SMT on Intel/AMD processors use a mere ~5% of core area, meaning total die wise, its in the 1-2% range. On Power 5, SMT-specific enhancements caused the core area to balloon by 24%. That's a big amount, and something that can be used to increase ILP instead, which is far more relevant to areas Intel/AMD plays in.
See, for IBM that derives most of its benefits on selling software and hardware(including the CPU) tailored for that software, it makes sense to further optimize. It's SMT also tends to do very well in transactional database which is a big chunk of the market for them.
In the case of Intel/AMD they are selling chips all the way from 12mm thick ultrabooks to enthusiast gaming desktops and massive $200+ million HPC server clusters.