Why do you think single cycle throughput is mandatory? Or what if for instance your 256-bit SVE units can be split into 2 128-bit AdvSIMD units?Of course, they would not just add a SVE decoders using existing FP units. If you want single cycle throughput you need to match the vector length. I wonder if they will go 256 or 512 bits wide SVE?
My point is just that SVE doesn't imply more FP performance. They can improve FP performance without going to SVE (and they didn't do it), or they can add SVE without increasing peak FP performance.
The question rather is: do they need more FP performance? In their current market I'd say no. After all they already are at 384-bit per cycle vs 512-bit for AVX2 Intel chips. OTOH if they indeed want to go to laptop, I think they will have to increase FP perf.