While I would love to see Apple make more aggressive HEDT parts, I'm still not sold on the "wider is better" argument. While the high single threaded performance is impressive, it seems as though it comes with some sacrifices (namely clock speed) and as a result, may not be as suited to desktop workloads. Case in point, I was quite surprised to see how badly the M1 was beaten by even the old Skylake architecture let alone Zen 3, when it came to HEVC software encoding (way more compute intensive than H264) despite the huge advantage in IPC and process node. I realize that the software is beta, but considering the huge gap, I don't think any greater optimizations could completely bridge that gap even on a core for core basis.
This ties in to my argument from the other thread. Single threaded performance while extremely important due to how it contributes to the overall throughput of an architecture, is not very relevant in and of itself in desktop/workstation/server workloads because those workloads have long been progressing towards more and more parallelism.
To further my point, many of the benchmarks that the Apple proponents like to use as examples of the M1's prowess like Geekbench and Spec all run
much faster in multithreaded mode than they do in single threaded mode. In fact, it's not even close. So while Apple's M1 is highly impressive (especially from a perf/watt perspective), it seems to come up short against modern x86-64 architectures like Zen 3 that prioritize high clock speeds, SMT and wide SIMD in heavy workloads.
Increasing the core count will of course go a long way in evening the odds, but it's possible that they may have to adapt their cache hierarchy which could end up affecting single threaded performance.