Why did intel decide to make AVX2 use FMA3 while AMD went with FMA4?
Is FMA3 better for general purpose while FMA4 is better for games?
Is FMA3 better for general purpose while FMA4 is better for games?
I think Intel will support both eventually as well.
Intel's first AVX specification actually used the FMA4 instruction format. AMD's SSE5 specification used FMA3. Intel thought FMA3 was a good idea, while practically simultaneously AMD decided to drop SSE5 and implement the original AVX specification...Why did intel decide to make AVX2 use FMA3 while AMD went with FMA4?
No. They're both specifications for the FMA instruction and they have the exact same uses. The difference is negligible to the end user. FMA3 is slightly more efficient to implement in hardware, but there's also a tiny chance that every now and then an extra instruction is required to work around the limitation it imposes (but on modern processors that instruction takes no execution time).Is FMA3 better for general purpose while FMA4 is better for games?
There's nothing amazing about that. AMD's implementation is pretty horrible. They have two 128-bit vector units per module, while Intel has two 256-bit vector units per core. AMD compensated by making each unit capable of executing a multiplication, an addition, or a fused multiplication and addition per cycle. But they compromised on latency.It's still amazing that AMD managed to get TWO chips with FMA out before Intel when Intel was the one controlling the instruction specification.
The length of the macro-instruction encoding is not so critical. It's the length of the micro-instruction encoding that's the issue. With FMA4, the uop cache would require extra bits for the fourth operand, while no other instruction would use it.FMA3 /probably/ has a shorter instruction than FMA4 since 4 requires you to specify 4 registers while 3 requires you to specify 3; IF this is true, fma3 MIGHT be better in the regards that its smaller and takes less space (e.g., more instruction in cache etc..). I haven't read the specs so don't quote me.
Regardless of the encoding format, three input operands have to be read and one result is written. Besides, 256-bit registers aren't wide at all. GPUs use registers of up to 4096-bit, using less advanced semiconductor process technology. That said, AVX can be extended to 1024-bit, and possibly beyond...My also guess other main reason for FMA3 vs. FMA4 is due to the load capacitance, 256-bit registers are wide and hold a lot of capacitance, w/ 3 vs. 4 that is 33% extra capacitance; my guess is this extra drive current might be better used to increase clock speed, or this would increase power consumption too much (and everything is about power these days).