Go Back   AnandTech Forums > Hardware and Technology > CPUs and Overclocking

Forums
· Hardware and Technology
· CPUs and Overclocking
· Motherboards
· Video Cards and Graphics
· Memory and Storage
· Power Supplies
· Cases & Cooling
· SFF, Notebooks, Pre-Built/Barebones PCs
· Networking
· Peripherals
· General Hardware
· Highly Technical
· Computer Help
· Home Theater PCs
· Consumer Electronics
· Digital and Video Cameras
· Mobile Devices & Gadgets
· Audio/Video & Home Theater
· Software
· Software for Windows
· All Things Apple
· *nix Software
· Operating Systems
· Programming
· PC Gaming
· Console Gaming
· Distributed Computing
· Security
· Social
· Off Topic
· Politics and News
· Discussion Club
· Love and Relationships
· The Garage
· Health and Fitness
· Merchandise and Shopping
· For Sale/Trade
· Hot Deals with Free Stuff/Contests
· Black Friday 2014
· Forum Issues
· Technical Forum Issues
· Personal Forum Issues
· Suggestion Box
· Moderator Resources
· Moderator Discussions
   

Reply
 
Thread Tools
Old 11-09-2012, 02:11 PM   #1
Anarchist420
Diamond Member
 
Join Date: Feb 2010
Posts: 8,564
Default When is FMA3 better than FMA4?

Why did intel decide to make AVX2 use FMA3 while AMD went with FMA4?

Is FMA3 better for general purpose while FMA4 is better for games?
__________________
If thomas Jefferson = 2, then A Jackson = 1; A Hamilton A Hitler and A Lincoln = -1; WWilson neocons = -2
Anarchist420 is offline   Reply With Quote
Old 11-09-2012, 02:27 PM   #2
Edrick
Golden Member
 
Edrick's Avatar
 
Join Date: Feb 2010
Location: Boston MA
Posts: 1,525
Default

Neither one is "better".

Each has its own way of working. The difference comes into play when coding the application.

Both should offer about the same performance gain.

Also, AMD supports both now with Piledriver. I think Intel will support both eventually as well.
__________________
Core i7 4770
Gigabyte Z87X-UD3H (F5 BIOS)
G.Skill RipjawsZ 8GB @ 2400mhz 10-12-12-31
Gigabyte GTX 660
Samsung 840 Pro 256GB
Antec Eleven Hundred
Edrick is offline   Reply With Quote
Old 11-09-2012, 03:39 PM   #3
Exophase
Platinum Member
 
Join Date: Apr 2012
Posts: 2,263
Default

FMA4 allows for completely independent source and destination operands, while FMA3 requires that one of the source operands is overwritten with the destination. So FMA4 is more flexible, but since FMA3 offers all permutations for selecting which register you want overwritten it's only rarely that FMA4 is actually more useful.

My guess is that Intel wanted to go with FMA3 because it was more efficient to implement in hardware. If their uop format is only three operand then a single four operand instruction presents a problem.

Quote:
Originally Posted by Edrick View Post
I think Intel will support both eventually as well.
I strongly doubt that.
Exophase is offline   Reply With Quote
Old 11-09-2012, 04:13 PM   #4
jones377
Senior Member
 
Join Date: May 2004
Posts: 431
Default

AMD's support for FMA4 comes from the fact that it was Intel's own specification for FMA before they changed their mind and the specs to FMA3. It's still amazing that AMD managed to get TWO chips with FMA out before Intel when Intel was the one controlling the instruction specification.
jones377 is offline   Reply With Quote
Old 11-09-2012, 04:32 PM   #5
BenchPress
Senior Member
 
Join Date: Nov 2011
Posts: 392
Default

Quote:
Originally Posted by Anarchist420 View Post
Why did intel decide to make AVX2 use FMA3 while AMD went with FMA4?
Intel's first AVX specification actually used the FMA4 instruction format. AMD's SSE5 specification used FMA3. Intel thought FMA3 was a good idea, while practically simultaneously AMD decided to drop SSE5 and implement the original AVX specification...
Quote:
Is FMA3 better for general purpose while FMA4 is better for games?
No. They're both specifications for the FMA instruction and they have the exact same uses. The difference is negligible to the end user. FMA3 is slightly more efficient to implement in hardware, but there's also a tiny chance that every now and then an extra instruction is required to work around the limitation it imposes (but on modern processors that instruction takes no execution time).

For AMD the support of FMA4 is dead weight more than anything else, now that they also support FMA3. They might get rid of FMA4 at some point. It shouldn't be of any concern to consumers.
BenchPress is offline   Reply With Quote
Old 11-09-2012, 10:50 PM   #6
BenchPress
Senior Member
 
Join Date: Nov 2011
Posts: 392
Default

Quote:
Originally Posted by jones377 View Post
It's still amazing that AMD managed to get TWO chips with FMA out before Intel when Intel was the one controlling the instruction specification.
There's nothing amazing about that. AMD's implementation is pretty horrible. They have two 128-bit vector units per module, while Intel has two 256-bit vector units per core. AMD compensated by making each unit capable of executing a multiplication, an addition, or a fused multiplication and addition per cycle. But they compromised on latency.

Intel hasn't added FMA before, simply because having two 256-bit vector units (one for multiplication and one for addition) is plenty to exhaust the available load/store and cache bandwidth. With Haswell, Intel will double the bandwidth so dual 256-bit FMA becomes useful. What's more, they're not worsening the latencies.

AMD will have to double the width of its vector units to keep up. But none of their roadmaps make any mention of it. Nor have they announced AVX2 support yet. They're betting the farm on HSA, but it's in deep trouble.
BenchPress is offline   Reply With Quote
Old 11-09-2012, 10:58 PM   #7
lambchops511
Senior Member
 
Join Date: Apr 2005
Posts: 659
Default

FMA3 /probably/ has a shorter instruction than FMA4 since 4 requires you to specify 4 registers while 3 requires you to specify 3; IF this is true, fma3 MIGHT be better in the regards that its smaller and takes less space (e.g., more instruction in cache etc..). I haven't read the specs so don't quote me.

In terms of other performance wise, there probably isn't that big of a difference due to register aliasing, actually, my guess is Intel resorted to FMA3 to make branch prediction easier (e.g., related to the reason of register aliasing + score boarding).

My also guess other main reason for FMA3 vs. FMA4 is due to the load capacitance, 256-bit registers are wide and hold a lot of capacitance, w/ 3 vs. 4 that is 33% extra capacitance; my guess is this extra drive current might be better used to increase clock speed, or this would increase power consumption too much (and everything is about power these days).
lambchops511 is offline   Reply With Quote
Old 11-09-2012, 11:16 PM   #8
BenchPress
Senior Member
 
Join Date: Nov 2011
Posts: 392
Default

Quote:
Originally Posted by lambchops511 View Post
FMA3 /probably/ has a shorter instruction than FMA4 since 4 requires you to specify 4 registers while 3 requires you to specify 3; IF this is true, fma3 MIGHT be better in the regards that its smaller and takes less space (e.g., more instruction in cache etc..). I haven't read the specs so don't quote me.
The length of the macro-instruction encoding is not so critical. It's the length of the micro-instruction encoding that's the issue. With FMA4, the uop cache would require extra bits for the fourth operand, while no other instruction would use it.
Quote:
My also guess other main reason for FMA3 vs. FMA4 is due to the load capacitance, 256-bit registers are wide and hold a lot of capacitance, w/ 3 vs. 4 that is 33% extra capacitance; my guess is this extra drive current might be better used to increase clock speed, or this would increase power consumption too much (and everything is about power these days).
Regardless of the encoding format, three input operands have to be read and one result is written. Besides, 256-bit registers aren't wide at all. GPUs use registers of up to 4096-bit, using less advanced semiconductor process technology. That said, AVX can be extended to 1024-bit, and possibly beyond...
BenchPress is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 09:45 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.