FMA doesn't even exist on SSE ...What does emulation have to do with it? Do you mean the semantics of one FMA operation on SSE is not the same as on AVX? Because it is the same on SVE and NEON so you can use the same multiplier.
It's actually not that simple. The vast majority of SVE's data processing instructions are destructive compared to NEON so you'd have to implement new wiring in hardware to support to support this. That's assuming no other asymmetry between SVE and NEON such as different precision behaviour which would involve even more modifications as well.What didn't you understand in the datapath word? I thought its meaning was clear enough: it's the part that does the computation, not the one doing the control. I insist that datapaths can be shared between NEON and SVE2.
AVX-512 does have predication but it's overhead is small compared to a vector length agnostic model like SVE. Predication is beneficial if you want to avoid large branch misprediction penalties but this cost has to be amortized across larger vectors and introducing predication with a 128-bit vector length makes no sense ...Didn't AVX-512 introduce predication? Even if it's in the encoding it has impact on control logic and on datapaths. So is AVX-512 unable to reuse any AVX(2) block?
There's more overhead involved from switching between variable length vectors than moving between a fixed number of lanes and making the hardware do it for you instead of changing the code is bad news ...
For compatibility ? Is this not obvious ?What's the point of mixing SSE and AVX? Is AVX lacking instructions that SSE has?
It wouldn't matter because you'd also have to support SVE as well since SVE2 is actually superset of SVE and builds upon it's instruction encoding.ARM never said that SVE2 replaces NEON. Supporting both at the same time is not that costly as long as you share datapaths.
It's a massive issue because ARM Holdings had to make the trade-off between having predication or constructive forms where the destination register is distinct from it's source operands. SVE's instruction set consists of destructive predicated instructions and constructive unpredicated instructions while on AVX-512 nearly all of it's instructions are both predicated and constructive. Instruction encoding has a direct effect on the features of the instruction set.Encoding is not an issue. I wonder why you think it matters that much especially when x86 is known to just stink in that department.
The life of a compiler is very complicated with SVE in comparison to AVX-512. Having a destructive syntax means that the compiler needs to copy the data in some of the registers leading to increased register pressure. Not having predication makes auto-vectorization more difficult. SVE's instruction encoding could easily run out of space soon to add more features.