Having a large OOE engine means more bandwidth pressure to memory, if you arent as efficient in terms of bytes of data into the core per retired op then at large core counts you would see regressions unless you then spent even more xtor budget/power on the memory sub system.
Not sure if a large OOE will really degrade performance, sounds illogical. At least for server, where peak clock-rate is not a factor (because much lower what the core actually can do). It might not yield in many performance gains because you choke it elsewhere. But a large OOE can hide memory bottlenecks because of extended reorder capabilities. That will not work for all applications, but some. In the end, performance gets determined by a factor of IPC * Frequency. If a wide OOE cannot clock that high, it might still deliver the same performance and therefore memory pressure. But you have to spend more, because the chip area gets bigger. But it might be more efficient, because it clocks lower. So there you have it, the intricate balance of PPA.
I agree with your point, that a fat core might be useless when memory system and data fabric cannot support it. But that could be the cool thing with Zen 6, it removes those memory bottlenecks by a vast amount:
- Lower latency CCD interface
- Higher bandwidth CCD interface
- Increased total L3$ capacity
- Higher DRAM bandwidth
And with that in place, being less memory bound, you can add a larger engine in the core without being hamstrung.