• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."
  • Community Question: What makes a good motherboard?

Cascade Lake beats Rome in the race for 2019 TACC Supercomputer

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

IEC

Elite Member
Super Moderator
Jun 10, 2004
13,927
3,599
136
Should be interesting to see if those hardware mitigations produce better performance than a fully-patched Skylake-X server running the same clocks.
They should produce better numbers than the software+firmware mitigations in place now. However, some of those result in rather extreme (>30%) double-digit performance drops in real-world use cases, so it'd have to be a huge performance uplift to convince affected customers.
 

Abwx

Diamond Member
Apr 2, 2011
9,117
902
126
I was wrong to say rewriting but it will need compiler intrinsics to account for the way Zen does AVX since cannot do 2FMA/cycle in a straightforward manner.
In case you didnt notice FMA is totally unrelated to AVX, prove is that SB has AVX without FMA, the latter is a specific instruction set.
 

Nothingness

Platinum Member
Jul 3, 2013
2,153
398
126
In case you didnt notice FMA is totally unrelated to AVX, prove is that SB has AVX without FMA, the latter is a specific instruction set.
Except that if you want 256-bit FMA you need AVX. And as far as I know no CPU has FMA without AVX.
 

Abwx

Diamond Member
Apr 2, 2011
9,117
902
126
Except that if you want 256-bit FMA you need AVX.
Not at all, all you need is a FMAC capable exe unit, command instructions are not part of AVX subset.

And as far as I know no CPU has FMA without AVX.
That doesnt invalidate what i m pointing, the fact is that Intel implemented FMA along with the AVX2 subset (wich is mainly for integer ops), hence the confusion in some brains, not speaking of yours, but there s a general belief that FMA is part of AVX/AVX2...
 

Nothingness

Platinum Member
Jul 3, 2013
2,153
398
126
Not at all, all you need is a FMAC capable exe unit, command instructions are not part of AVX subset.
And you get your 256-bit registers from where exactly? Ha yes that comes with AVX.

EDIT: I indeed know that FMA is kind of separate from AVX :)
 

Abwx

Diamond Member
Apr 2, 2011
9,117
902
126
And you get your 256-bit registers from where exactly? Ha yes that comes with AVX.

..EDIT: I indeed know that FMA is kind of separate from AVX :)
256b, or rather 4x 64b/8x 32b registers are for AVX2....

Bulldozer/Piledriver has FMA, and no 256 bit regsters, albeit it support AVX128, they could have supprted the former without implementing the latter but the CPU would still execute FMA ops the same way...
 

Nothingness

Platinum Member
Jul 3, 2013
2,153
398
126
I explicitly wrote that if you want 256b wide FMA then you'll need AVX. Are you disputing this?
 

lightmanek

Senior member
Feb 19, 2017
241
436
106
I explicitly wrote that if you want 256b wide FMA then you'll need AVX. Are you disputing this?
Yes, he is.
AMD originally planned Bulldozer to support SSE5 and FMA without AVX. Obviously that never happened, but AMD even published some SSE5 documentations to get software developers opinions on their implementation.
 

Abwx

Diamond Member
Apr 2, 2011
9,117
902
126
I explicitly wrote that if you want 256b wide FMA then you'll need AVX. Are you disputing this?
That s it, if you want three 64 bits operands to be FMAed all you need is a 256b wide register and a FMAC exe unit.

Intel implemented AVX before FMA but they could have made SB FMA compatible without implementing AVX, it s just that they were dodging wether implement FMA4 or FMA3, as AMD did choose the former they decided to implement FMA3 instead.

FMA4 need one more register to keep the operation non destructive since the result doesnt use one of the register that is used to store one of the three variables that are combined in a multiply accumulate op.
 
  • Like
Reactions: lightmanek

JoeRambo

Senior member
Jun 13, 2013
913
649
136
That s it, if you want three 64 bits operands to be FMAed all you need is a 256b wide register and a FMAC exe unit.
Actually just 128bit wide register is needed, as AVX has instructions for 128 bit wide vectors. (and 256 and now 512 ).

This all discussion about what is needed and what is not is completely esoteric in x86 CPU's, as AMD was at the time irrelevant both in market penetration and performance. What matters in this discussion is that all FMA operations are presented as VEX or EVEX instructions, that were designed for AVX project.
 

Nothingness

Platinum Member
Jul 3, 2013
2,153
398
126
That s it, if you want three 64 bits operands to be FMAed all you need is a 256b wide register and a FMAC exe unit.
Ok I know you are not utterly stupid so I guess I wasn't clear: how do I FMA 3 or 4 256-bit wide vectors when I don't have a 256-bit wide reg file (which comes with AVX)?

For 128-bit wide vectors SSE already provided 128-bit regs so it's obviously not an issue, you don't need AVX.
 

Nothingness

Platinum Member
Jul 3, 2013
2,153
398
126
Yes, he is.
AMD originally planned Bulldozer to support SSE5 and FMA without AVX. Obviously that never happened, but AMD even published some SSE5 documentations to get software developers opinions on their implementation.
I'm talking about 256-bit wide FMA. Did SSE5 and 256-bit regs and ops?
 

gdansk

Senior member
Feb 8, 2011
525
212
116
Given the importance of vector units in scientific computing, I'd reckon it's mainly AVX-512 that drove the decision. I'm still shocked Intel sees no reason to develop a variable-length vector extension. From a compiler writer's perspective it seems simpler. There would be no need to modify auto-vectorization every few years.
 
  • Like
Reactions: Nothingness

moinmoin

Platinum Member
Jun 1, 2017
2,101
2,510
106
I'm still shocked Intel sees no reason to develop a variable-length vector extension.
But that wouldn't allow them to sell you 1024bit etc. upgrades later on.

AMD using something like SVE instead while going wide would be nice a nice alternative indeed.
 

ASK THE COMMUNITY