Knight's Landing, Skylake to unify instruction sets?

NTMBK · Jul 3, 2013

Knight's Landing is listed as having AVX3.1, Skylake as having AVX3.2- does this mean we will see SSE & AVX support in the next Phi, and Phi instruction set support in Skylake?

BallaTheFeared · Jul 3, 2013

I don't know what AVX 3.2 is, but I'm upgrading to it!

ShintaiDK · Jul 3, 2013

Might just be the AVX instructions and not any legacy.

NTMBK · Jul 3, 2013

ShintaiDK said:
Might just be the AVX instructions and not any legacy.

So long as it still has nice scatter/gather and vector masking, I don't mind. Full compatibility with their existing vector instructions for CPUs would be more important than compatibility with the single generation of Phis before it, in my eyes.

ShintaiDK · Jul 3, 2013

NTMBK said:
So long as it still has nice scatter/gather and vector masking, I don't mind. Full compatibility with their existing vector instructions for CPUs would be more important than compatibility with the single generation of Phis before it, in my eyes.

One could also make the conclusion, that since they both go (the ability for) socket and DDR4 (no GDDR5?), then compability will be very high on the list.

NostaSeronx · Jul 3, 2013

Homogenous vs Heterogeneous

It is about to begin!

cytg111 · Jul 3, 2013

Might even get exciting again .. and give moore some much needed CPR

inf64 · Jul 3, 2013

AVX3.2 is in Skylake and the core will be able to do 16Flops per core per cycle (2x Haswell). This is presumably with the FMA part of the ISA.

Arachnotronic · Jul 3, 2013

inf64 said:
AVX3.2 is in Skylake and the core will be able to do 16Flops per core per cycle (2x Haswell). This is presumably with the FMA part of the ISA.

Yup.

2is · Jul 3, 2013

inf64 said:
AVX3.2 is in Skylake and the core will be able to do 16Flops per core per cycle (2x Haswell). This is presumably with the FMA part of the ISA.

*Not available on K series processor

(if Intel's current business model is any indication)

NTMBK · Jul 3, 2013

inf64 said:
AVX3.2 is in Skylake and the core will be able to do 16Flops per core per cycle (2x Haswell). This is presumably with the FMA part of the ISA.

Double vector width then, not just executing a 512bt ISA on 256bt vectors? Yeesh- I wonder how the core counts and clock speeds of the Phi and Skylake will match up. Haswell EP is meant to have up to 15 cores, and will likely be close to 3GHz (if previous Xeon evidence is anything to go by)- not that far off Xeon Phi territory. Although the Phi does have 4 way SMT on its side.

jpiniero · Jul 3, 2013

Would it really be practical for Intel to, say, remove the MMX/SSE/AVX units from the x86 core and throw some Phi cores on the die instead?

TuxDave · Jul 3, 2013

jpiniero said:
Would it really be practical for Intel to, say, remove the MMX/SSE/AVX units from the x86 core and throw some Phi cores on the die instead?

At that point, you may as well throw away the entire core and stick with 100% Xeon Phi.

NTMBK · Jul 3, 2013

NostaSeronx said:
Homogenous vs Heterogeneous

It is about to begin!

NTMBK · Jul 3, 2013

jpiniero said:
Would it really be practical for Intel to, say, remove the MMX/SSE/AVX units from the x86 core and throw some Phi cores on the die instead?

Oh good gracious no. The #1 selling point of x86 (at least on Windows) is backwards compatibility. If you can't run MMX, SSE or AVX, the vast majority of applications from the last 10 years flat out won't run on your core. (Not to mention, AMD64 mandates a minimum of SSE2.)

jpiniero · Jul 3, 2013

NTMBK said:
Oh good gracious no. The #1 selling point of x86 (at least on Windows) is backwards compatibility. If you can't run MMX, SSE or AVX, the vast majority of applications from the last 10 years flat out won't run on your core. (Not to mention, AMD64 mandates a minimum of SSE2.)

Call it "I can't believe it's not x86" then. Include software emulation to provide compatibility. "Force" people to upgrade their software to take advantage, and of course they would then buy new hardware. Lock AMD out :twisted:

NTMBK · Jul 3, 2013

jpiniero said:
Call it "I can't believe it's not x86" then. Include software emulation to provide compatibility. "Force" people to upgrade their software to take advantage, and of course they would then buy new hardware. Lock AMD out :twisted:

The only thing which is even starting to go that way is x87

People bought into SSE to improve performance in apps that needed it, if Intel started tanking it intentionally they'd get a bit annoyed.

However, implementing SSE on the lower 128bts of a 512bt pipeline can't be that tricky, surely (I say naively)- they already do it for 128-on-256 with Haswell.

TuxDave · Jul 3, 2013

NTMBK said:
The only thing which is even starting to go that way is x87 People bought into SSE to improve performance in apps that needed it, if Intel started tanking it intentionally they'd get a bit annoyed.

However, implementing SSE on the lower 128bts of a 512bt pipeline can't be that tricky, surely (I say naively)- they already do it for 128-on-256 with Haswell.

You're mostly right. It's not hard or expensive but there are some stupid mechanisms you still have to deal with to keep it architecturally correct.

http://software.intel.com/en-us/articles/avoiding-avx-sse-transition-penalties/

128-bit Intel® AVX instructions operate on the lower 128 bits of the YMM registers and zero the upper 128 bits. However, legacy Intel® SSE instructions operate on the XMM registers and have no knowledge of the upper 128 bits of the YMM registers. Because of this, the hardware saves the contents of the upper 128 bits of the YMM registers when transitioning from 256-bit Intel® AVX to legacy Intel® SSE, and then restores these values when transitioning back from Intel® SSE to Intel® AVX (256-bit or 128-bit).

Ayah · Jul 3, 2013

I'm kinda curious how a socketed knight's landing would be priced..

NTMBK · Jul 4, 2013

Ayah said:
I'm kinda curious how a socketed knight's landing would be priced..

Hopefully not too much more than the regular socketed Xeon E5s- not including the cost of GDDR5 and the PCB should help a lot.

mrmt · Jul 4, 2013

NTMBK said:
Hopefully not too much more than the regular socketed Xeon E5s- not including the cost of GDDR5 and the PCB should help a lot.

Currently Xeon Phi with 8GB GDDR5 has smaller list price than most top end EP Xeons.

sushiwarrior · Jul 4, 2013

I wouldn't expect Xeon Phi to have many instruction sets. Adding just a single set means you have 50+ cores all adding the hardware for that set, which makes any additional sets extremely expensive to implement. In addition, Phi is about large numbers of SIMPLE cores - don't expect full featured cores, expect more cut down cores.

BenchPress · Jul 4, 2013

NTMBK said:
Knight's Landing is listed as having AVX3.1, Skylake as having AVX3.2- does this mean we will see SSE & AVX support in the next Phi...

Not likely. Xeon Phi is targeted exclusively at the HPC market, and runs software by and for that market. So it doesn't have to be binary compatible with legacy CPU extensions.

You may not even want that. Xeon Phi is an in-order execution architecture with hundreds of threads, while desktop CPUs are out-of-order execution architectures with a modest number of threads. This requires a somewhat different programming approach. Code meant for one isn't going to run well on the other without at least recompiling. And if you have to recompile anyway, it might as well be binary incompatible to keep the hardware lean. Xeon Phi doesn't support unaligned vector operands, for starters. Adding support for that just to support smaller vector, makes very little sense.

It might just be a marketing decision to name them similarly. It stresses that CPUs can be equally useful for high throughput computing. It's just not their only focus, like it is with Xeon Phi.

...and Phi instruction set support in Skylake

That's a little more likely. AVX 3.2 suggests backward compatibility with Phi's AVX 3.1.

That would mean that AVX 3.2 is a significant departure from AVX2 and not just a widening of it. Phi has mask registers, for instance. It's arguable whether that's relevant. AVX was also a departure from SSE but the new encoding format supports all the old operations.

ShintaiDK · Jul 4, 2013

I wonder if Broadwell will support AVX 3.1, or if Skylake simply jumps in with both 3.1 and 3.2.

NTMBK · Jul 4, 2013

ShintaiDK said:
I wonder if Broadwell will support AVX 3.1, or if Skylake simply jumps in with both 3.1 and 3.2.

Probably not- Phi is a very, very long way away from AVX2 (as Benchpress rightly points out), so adding support to match Phi would be pretty far outside the usual scope of just shrinking Haswell.

Knight's Landing, Skylake to unify instruction sets?

Lifer

Diamond Member

Lifer

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Platinum Member

Lifer

Diamond Member

Senior member

Senior member

Lifer

Lifer