Originally Posted by jones377
It's still amazing that AMD managed to get TWO chips with FMA out before Intel when Intel was the one controlling the instruction specification.
There's nothing amazing about that. AMD's implementation is pretty horrible. They have two 128-bit vector units per module, while Intel has two 256-bit vector units per core. AMD compensated by making each unit capable of executing a multiplication, an addition, or a fused multiplication and addition per cycle. But they compromised on latency.
Intel hasn't added FMA before, simply because having two 256-bit vector units (one for multiplication and one for addition) is plenty to exhaust the available load/store and cache bandwidth. With Haswell, Intel will double the bandwidth so dual 256-bit FMA becomes useful. What's more, they're not worsening the latencies.
AMD will have to double the width of its vector units to keep up. But none of their roadmaps make any mention of it. Nor have they announced AVX2 support yet. They're betting the farm on HSA, but it's in deep trouble