- Mar 3, 2017
- 1,777
- 6,791
- 136
Do you realize that'd mean all legacy code would need recompilation to be fast on Zen5?
That's not how things work. And that's why OoOE is required to have a fast CPU (until things change dramatically in the HW/SW space...): to let a beefed up core benefit from its extra ressources without having to recompile everything. And all modern CPU are good at that.
I can't remember when was the last time, recompiling with instruction scheduling for native CPU brought any significant speedup (>2%, that is not in measurement noise) for code I run. Of course native instruction selection is a different story but that doesn't apply to clang.
Not generally but some workloads could see massive gains such as the one here: https://www.phoronix.com/review/amd-aocc-4/2Anyone hoping for non-trivial gains on Zen5 from compilers getting a machine model is likely to be disappointed.
You either didn't read my reply or didn't understand the problem.Not generally but some workloads could see massive gains such as the one here: https://www.phoronix.com/review/amd-aocc-4/2
View attachment 102602
I think there is a good chance that Zen 5 would benefit a lot from recompilation. Since AMD is more concerned about enterprise workloads, most people running those workloads would recompile open source software anyway. One could say that Zen 5 is a FOSS-focused architecture.
Somewhat understood you but my stance is that AMD no longer cares about legacy code that much and I kinda understand if that's the truth because modern CPUs are plenty fast for legacy code as it is. Wasting transistors on making that legacy code run faster seems unwise when those same transistors could be used to extract much more performance from recompiling source code.You either didn't read my reply or didn't understand the problem.
Not generally but some workloads could see massive gains such as the one here: https://www.phoronix.com/review/amd-aocc-4/2
View attachment 102602
I think there is a good chance that Zen 5 would benefit a lot from recompilation. Since AMD is more concerned about enterprise workloads, most people running those workloads would recompile open source software anyway. One could say that Zen 5 is a FOSS-focused architecture.
Somewhat understood you but my stance is that AMD no longer cares about legacy code that much and I kinda understand if that's the truth because modern CPUs are plenty fast for legacy code as it is. Wasting transistors on making that legacy code run faster seems unwise when those same transistors could be used to extract much more performance from recompiling source code.
Here is an example of 5950x vs 7950x:
W10 AVX 512 on/off. Hardly any difference. Ignore comparison to previous W11 scores, W10 seems to score slightly better for ST.
No idea. I just ran it normal, then disabled AVX512 in the UEFI and ran it again.Both your scores show AVX2. Is that expected?
Faster code will be mostly due to new instructions that allow vectorizing not due to uarch details. And I doubt any new instruction of Zen5 will benefit clang test.Somewhat understood you but my stance is that AMD no longer cares about legacy code that much and I kinda understand if that's the truth because modern CPUs are plenty fast for legacy code as it is. Wasting transistors on making that legacy code run faster seems unwise when those same transistors could be used to extract much more performance from recompiling source code.
Good but we would have to run both CPUs with the same clockspeed to get a better idea of the gain.Here is an example of 5950x vs 7950x:
ASUS System Product Name vs System manufacturer System Product Name - Geekbench
browser.geekbench.com
Yeah I double checked on an AVX-512 machine and indeed GB always shows AVX2. But see my post above comparing 5950x vs 7950x. It seems to show AVX-512 benefits at least Object Remover.No idea. I just ran it normal, then disabled AVX512 in the UEFI and ran it again.
Yeah I double checked on an AVX-512 machine and indeed GB always shows AVX2. But see my post above comparing 5950x vs 7950x. It seems to show AVX-512 benefits at least Object Remover.
Well then my conclusion is that AVX-512 disablement in the UEFI is broken and doesn't actually disable AVX-512. Or just prevents certain AVX-512 instructions from running.It seems to show AVX-512 benefits at least Object Remover.
Or perhaps GB AVX-512 path is faster only on Intel machines due to the 256-bit DP of AMD Zen4. But I find it strange.Well then my conclusion is that AVX-512 disablement in the UEFI is broken and doesn't actually disable AVX-512. Or just prevents certain AVX-512 instructions from running.
The thirty-something percent IPC figures were based on a single thing: the 96C Turin sample scoring 50% in SIR nT at +25% of the power over 96C Genoa. That's it. AMD's own roadmaps (leaked by yours truly MLID) never mentioned anything of the sort.Final question is whether AMD could have had time to adjust Zen5 arch accordingly compared to original plan. Could the plan originally have been 30-40% IPC increase, and that was what was communicated (as indicated by some leakers), but then they changed their strategy and went for lower per increase and lower price? Or would it be to late for them to change in such a way, from the time they communicated the original intended perf increase until time of launch?
So are you suggesting Zen5 Server core looks completely different compared to Zen5 DT core? Or why would the former score 50%, and the latter 10-20% higher IPC vs Zen4? Only taking higher TDP for the former into account does not explain the huge difference in IPC increase.The thirty-something percent IPC figures were based on a single thing: the 96C Turin sample scoring 50% in SIR nT at +25% of the power. That's it. AMD's own roadmaps (leaked by yours truly MLID) never mentioned anything of the sort.
No, I expect it to look exactly the same.So are you suggesting Zen5 Server core looks completely different compared to Zen5 DT core? Or why would the former score 50%, and the latter 10-20% higher IPC vs Zen4? Only taking higher TDP for the former into account does not explain the huge difference in IPC increase.
This admittedly old roadmap appears to suggest so.Also, are you saying that AMD's plan for Zen5 DT was always 10-20% IPC (or whatever it ends up to be)?
How would you achieve the higher sustained clocks necessary to reach +50% IPC increase with only 25% extra power consumption vs Zen4 (especially when results show that the last few extra watts usually gain very little extra perf)? Also, what do you mean with "SMT yield" and how much of the +50% IPC increase would that account for, and why?No, I expect it to look exactly the same.
It's just that you can arrive at the +50% (if it's real anyway) perf figure with more than just IPC increase, namely higher sustained clocks, SMT yield, etc.
How would you achieve the higher sustained clocks necessary to reach +50% IPC increase with only 25% extra power consumption vs Zen4? Also, what do you mean with "SMT yield" and how much of the IPC increase would that account for, and why?
But then is it still the same core on server as on DT?Could be improved perf from SMT (thread-private frontends); could be improved power gating or other power-oriented microarchitectural changes allowing for a significantly higher clock; could be optimistic pre-release slideware; could be a hallucination among those claiming it exists.