Cascade Lake beats Rome in the race for 2019 TACC Supercomputer

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,328
4,913
136
Should be interesting to see if those hardware mitigations produce better performance than a fully-patched Skylake-X server running the same clocks.

They should produce better numbers than the software+firmware mitigations in place now. However, some of those result in rather extreme (>30%) double-digit performance drops in real-world use cases, so it'd have to be a huge performance uplift to convince affected customers.
 

Abwx

Lifer
Apr 2, 2011
10,933
3,423
136
I was wrong to say rewriting but it will need compiler intrinsics to account for the way Zen does AVX since cannot do 2FMA/cycle in a straightforward manner.

In case you didnt notice FMA is totally unrelated to AVX, prove is that SB has AVX without FMA, the latter is a specific instruction set.
 

Nothingness

Platinum Member
Jul 3, 2013
2,393
730
136
In case you didnt notice FMA is totally unrelated to AVX, prove is that SB has AVX without FMA, the latter is a specific instruction set.
Except that if you want 256-bit FMA you need AVX. And as far as I know no CPU has FMA without AVX.
 

Abwx

Lifer
Apr 2, 2011
10,933
3,423
136
Except that if you want 256-bit FMA you need AVX.

Not at all, all you need is a FMAC capable exe unit, command instructions are not part of AVX subset.

And as far as I know no CPU has FMA without AVX.

That doesnt invalidate what i m pointing, the fact is that Intel implemented FMA along with the AVX2 subset (wich is mainly for integer ops), hence the confusion in some brains, not speaking of yours, but there s a general belief that FMA is part of AVX/AVX2...
 

Nothingness

Platinum Member
Jul 3, 2013
2,393
730
136
Not at all, all you need is a FMAC capable exe unit, command instructions are not part of AVX subset.
And you get your 256-bit registers from where exactly? Ha yes that comes with AVX.

EDIT: I indeed know that FMA is kind of separate from AVX :)
 

Abwx

Lifer
Apr 2, 2011
10,933
3,423
136
And you get your 256-bit registers from where exactly? Ha yes that comes with AVX.

..EDIT: I indeed know that FMA is kind of separate from AVX :)

256b, or rather 4x 64b/8x 32b registers are for AVX2....

Bulldozer/Piledriver has FMA, and no 256 bit regsters, albeit it support AVX128, they could have supprted the former without implementing the latter but the CPU would still execute FMA ops the same way...
 

Nothingness

Platinum Member
Jul 3, 2013
2,393
730
136
I explicitly wrote that if you want 256b wide FMA then you'll need AVX. Are you disputing this?
 

lightmanek

Senior member
Feb 19, 2017
387
754
136
I explicitly wrote that if you want 256b wide FMA then you'll need AVX. Are you disputing this?

Yes, he is.
AMD originally planned Bulldozer to support SSE5 and FMA without AVX. Obviously that never happened, but AMD even published some SSE5 documentations to get software developers opinions on their implementation.
 

Abwx

Lifer
Apr 2, 2011
10,933
3,423
136
I explicitly wrote that if you want 256b wide FMA then you'll need AVX. Are you disputing this?

That s it, if you want three 64 bits operands to be FMAed all you need is a 256b wide register and a FMAC exe unit.

Intel implemented AVX before FMA but they could have made SB FMA compatible without implementing AVX, it s just that they were dodging wether implement FMA4 or FMA3, as AMD did choose the former they decided to implement FMA3 instead.

FMA4 need one more register to keep the operation non destructive since the result doesnt use one of the register that is used to store one of the three variables that are combined in a multiply accumulate op.
 
  • Like
Reactions: lightmanek

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
That s it, if you want three 64 bits operands to be FMAed all you need is a 256b wide register and a FMAC exe unit.

Actually just 128bit wide register is needed, as AVX has instructions for 128 bit wide vectors. (and 256 and now 512 ).

This all discussion about what is needed and what is not is completely esoteric in x86 CPU's, as AMD was at the time irrelevant both in market penetration and performance. What matters in this discussion is that all FMA operations are presented as VEX or EVEX instructions, that were designed for AVX project.
 

Nothingness

Platinum Member
Jul 3, 2013
2,393
730
136
That s it, if you want three 64 bits operands to be FMAed all you need is a 256b wide register and a FMAC exe unit.
Ok I know you are not utterly stupid so I guess I wasn't clear: how do I FMA 3 or 4 256-bit wide vectors when I don't have a 256-bit wide reg file (which comes with AVX)?

For 128-bit wide vectors SSE already provided 128-bit regs so it's obviously not an issue, you don't need AVX.
 

Nothingness

Platinum Member
Jul 3, 2013
2,393
730
136
Yes, he is.
AMD originally planned Bulldozer to support SSE5 and FMA without AVX. Obviously that never happened, but AMD even published some SSE5 documentations to get software developers opinions on their implementation.
I'm talking about 256-bit wide FMA. Did SSE5 and 256-bit regs and ops?
 

gdansk

Platinum Member
Feb 8, 2011
2,078
2,559
136
Given the importance of vector units in scientific computing, I'd reckon it's mainly AVX-512 that drove the decision. I'm still shocked Intel sees no reason to develop a variable-length vector extension. From a compiler writer's perspective it seems simpler. There would be no need to modify auto-vectorization every few years.
 
  • Like
Reactions: Nothingness

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
I'm still shocked Intel sees no reason to develop a variable-length vector extension.
But that wouldn't allow them to sell you 1024bit etc. upgrades later on.

AMD using something like SVE instead while going wide would be nice a nice alternative indeed.