Nah, I think the talk about latency in the Z5 architecture thread makes sense.
I was just as surprised as anyone about Z5's state, particularly because they very very clearly buffed the INT and FP/SIMD pipelines extensively. Sure, you can make the case that FP got buffed way more, but even just with going from 4 to 6 ALUs in INT, the growth is substantial and should've been a serious increase. If
@MS_AT's theory that INT scalar is simply limited due to latency, I.E, the INT backend has all the throughput you want, and the core is as fat as it needs to be for literally several generations (same as Zen/Zen2 were back in the day), then it isn't a simple "for server only" decision. It is that for all the front-end efforts, the backend still bottoms out far beyond what the frontend can purvey.
If you think about it that way, Zen 5 makes total sense. It's not "imbalanced" or "SIMD only". It's just that SIMD throughput is large (AVX making it even more so) while a lot of your basic INT scalar doesn't have very large amounts of data, so it's about waiting for it to get through the backend.
- that new lookahead branch predictor is very different from before, and one can assume that it needs a lot of work over several generations same as how the BP in Zen->Zen 4 did before this
- the massive backend goes half used on most "basic" (low data high op spam) workloads
- the uncore, broadly speaking, has terrible latency vis à vis Intel and always compensated with a stronger core
- but it has now reached its limits, it is after all more or less a minimal evolution from what was Zen 2's I/O, which was designed to be cheap
- it is also likely that while the core itself is not going to need much changing anytime soon, the mem configuration (L1 and L2 caches) may need changes to get stuff faster in some scenarios
- since Zen 5 completely trounces AVX 512 workloads with minimal latency, it's very possible that AMD will be pushing a ton of compiler optimisations to get AVX in many, many, many more places, which means we may actually see a degree of FineWineing on Z5
Evolving the I/O into something far more more modern than "we put a few traces in the PCB and put a little I/O die there" and possibly reevaluating their memory config could be doing a lot for INT, if it is at all possible. David Huang says that kepler & co were entirely wrong to assume that the backend would bring forth a 35% perf increase even though it theoretically can, and C&C is writing that they've already reconfigured a fair bit of their L1 cache bandwidth. I don't know just how much more can be done with memory bottlenecks or latency, and David Huang pretty much said that it's impossible to fully use that backend.
This reminds me of the outrageous rumors that some people spread before, which were eventually proven to be wrong. Not only were they slapped in the face, but they were also furious and claimed that Zen 5 was the worst architecture since Bulldozer. Readers who follow me on Twitter may remember that the PMC data I mentioned in the article was actually collected as early as early April. At that time, my purpose was to see how much work was needed to achieve the performance improvements that some people boasted about. It turned out that a simple look at the PMC data showed that for the current x86 microarchitecture, it is simply a "dream" to achieve those outrageous rumored goals without sacrificing extreme frequency and extreme performance.
So I have no idea how much improvement a new I/O system will bring, might be huge, might not be. But it seems that the annoying prophecy about finding the limits of how fast cache/memory can feed the core is turning true.
Zen 5 is so radically new that Logic is now unable to be properly fed by data/code at an acceptable speed, unless you batch/vectorize/AVX. So yes, in 2 years time maybe AVX and the like will make Zen 5 and 6 much better actually, but for now, it is a bit disappointing.