SMT makes sense in a world where you have only one core that needs to fit every task. In that situation, sacrificing a couple percent ST performance for ~30% more throughput performance is easily worth it, hence why SMT still exists. But that's not the world we'll have in the future, where both big and small cores will be an option across the product stack.
I'm not going to belabor the point, but as others have already said, most workloads have a couple of ST critical threads at most, and the few that do scale beyond that often can use many cores. So for most workloads, the ideal config would be a number of ST-optimized big cores to capture those critical threads, and small cores for the rest. SMT only makes sense if you need to accommodate a wide variety of workloads with different threading demands, but we're getting to the point where raw core counts can easily cover a superset.
Gaming is actually interesting, in that it's one of the only consumer workloads demanding a moderate number of ST-critical threads. But even there, 8 big cores (SMT or not) is empirically capable of handling it today. And with chiplets, this is even less of an issue. Imagine Intel had one 8+0 (no SMT) chiplet and another 0+32 one. 2x 8+0 would easily handle gaming, 8+0 + 0+32 would be great for productivity, and 2x 0+32 would be really interesting for certain embedded use cases. And for workstations, could just add more of the E-core chiplets.
Also, the benefit isn't just the extra ~5% or whatever transistors; it's also the engineer time. How many 10s of thousands or even hundreds of thousands of hours have been devoted to implementing, maintaining, and securing SMT? What could we get if those engineers were devoted to something else? Be that power, area, ST performance, features, whatever.