You do know it's not clock normalized right?
I do but LunarLake Skymont core and Zen5c have the same frequency. (In the second article, I mean the same max frequency, but that's the best we can count on under this circumstances).
You will of course say that Skymont on LunarLake is handicapped by lack of L3. That is why we have the first article, where Desktop Skymont is clocked at 4.6GHz, and has access to much more generous cache setup and access to lower latency memory than Zen5c in Strix Point and is still loosing in some FP subtests to Zen5c core.
Likewise with Zen4 comparisons if the fpu pipe mix was deciding factor and the ability to do 4 fp adds or 4 fp muls per cycle was dominating any benchmark, then Skymont would have pulled ahead of Zen4 despite the freq disadvantage( 20% freq disadvantage vs 100% per cycle execution advantage). Yet it cannot dominate even the Zen5c mobile core. What suggests the pipe arrangment other cores have is sufficient in practice.
Anyway the point is, I do not know of a benchmark that would single out and show Skymont FPU pipe arrangement to be in practice better than LionCove/Zen5. So I was asking if you know any?
Also, comparison to Gracemont is important as doubled FP unit is the single biggest boost for FP performance that accounts for the 60-70% gain versus 30% that would be without. FP would have stayed at 30% without them!
Well, I was not saying doubling does not bring benefits...
If you take existing config(whatever it is) and double their numbers, performance in all existing applications would increase significantly, without recompiling code.
That is questionable statement. It is too broad so can be easily proven wrong if we assume cache to be a resource that is getting doubled. Still that was not what I was asking about.
There are many talks in software where technical complexity is needing extra resources and burden in development. Hence, every time you introduce a new feature that requires such a thing, significantly less adoption happens each and every time. AVX is much less of an advantage of SSE, AVX2 is much less over AVX, and AVX512 is much less over AVX2. And every 2 years at that too!
The reason for low adoption rate is Intel business strategy. If they adapted new instructions sets as baselines across whole stack (meaning 5$ Celeron and Xeon would support the same ISA regardless of performance) AVX512 would be enjoying wide adoption by now.
But could you link please the talks you have mentioned? They are usually interesting to watch.
You know what stopped them? When they stopped being a monopoly and is close to bankruptcy! If wider vectors are such a magic thing, why don't they keep expanding them?
Vector register wider than cacheline does not make sense. Even Intel engineers commented as such. Marketing is well marketing. But from personal experience having the reg width match cacheline width makes the life easier.
The other problem is that you can't design a game for a subset of an architecture unless that's in a major console because it makes it unportable to other architectures and at least for AAA games, you usually need all the sales you can get. Apart from platform exclusives, you have to design to more or less to a lowest common denominator.
Teach people that object oriented design is a tool and not an universal answer to every problem and that data oriented design also exists. If they have cpu friendly code in the first place then regardless where they port the code it will do well as operating principles are the same across architectures.