DrMrLordX
Lifer
Well compiler can use it for you like here https://godbolt.org/z/fT1WYfKvz (it's trivial example, but here we are summing 72 floats, there are 16 floats in 512b reg, so it compiles down to 4 512b ops and one 256b op (ymm are 256b regs, zmm are 512b ones).
So knowing compiler can use them for you and/or you can explicitly use them (for example for tail handling, small transpose or gemm kernels etc) it's really hard to give you more detailed answer than: "For sure".
Okay, thanks for presenting at least one example. My own experience with anything beyond AVX2 is pretty limited (since I was on a 3900X for so long), but I do remember ycruncher splitting their executable into a number of different executables targeting different platforms based at least in part on what ISA extensions were supported by the target hardware.
Regardless I don't recall there being a rash of software from (for example) the Rocket Lake or Raphael generation onward utilizing AVX-512 exclusively for all vector lengths. Probably because there's still so much hardware out there that does not support AVX-512 at all, including multiple Intel CPUs released after Rocket Lake.
Eventually it will make sense to target AVX-512 or AVX10 once enough recent generations of consumer hardware support one of those standards.
Your understanding here is still incorrect. Hybrid architectures aren't any easier with AVX10 than AVX512.
Could you elaborate on that?
