Knight's Landing, Skylake to unify instruction sets?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

NTMBK

Lifer
Nov 14, 2011
10,419
5,712
136
Not likely. Xeon Phi is targeted exclusively at the HPC market, and runs software by and for that market. So it doesn't have to be binary compatible with legacy CPU extensions.

You may not even want that. Xeon Phi is an in-order execution architecture with hundreds of threads, while desktop CPUs are out-of-order execution architectures with a modest number of threads. This requires a somewhat different programming approach. Code meant for one isn't going to run well on the other without at least recompiling. And if you have to recompile anyway, it might as well be binary incompatible to keep the hardware lean. Xeon Phi doesn't support unaligned vector operands, for starters. Adding support for that just to support smaller vector, makes very little sense.

It might just be a marketing decision to name them similarly. It stresses that CPUs can be equally useful for high throughput computing. It's just not their only focus, like it is with Xeon Phi.

Oh, I'm well aware that it won't run most legacy code especially well- I'm not looking to run Crysis on the thing. ;) But the ability to install Windows (or a standard Linux distro like RHEL) on the thing would make working with it a lot simpler. At the moment it has a specially hacked branch of the Linux kernel to run on it, but that's all.

The syntax for "offloading" tasks to the Phi is appallingly bad- try looking up some examples some time. It's #pragmas 4 lines long, and that sort of nonsense. Having it in a socket as the native CPU, driving the OS, makes it a much more straightforward target platform.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
I wonder if Broadwell will support AVX 3.1, or if Skylake simply jumps in with both 3.1 and 3.2.
Fat chance. 14 nm is a node and a half smaller, and they'll probably bring 6 or 8-core to the mainstream market. They'll have enough on their plate to not want to be bothered with a new architecture at the same time. The tick-tock model has worked really well so far. There could be a handful of new instructions, and some old ones being made faster, but that's minor in comparison to an entirely new architecture.

AVX 3.2 may just be AVX2 (256-bit) + AVX 3.1 (512-bit), whereas Phi will only support AVX 3.1 (an evolution from KNI). So it's not as if Skylake would support two new extensions. Just one new one, while retaining the old one(s).
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
Oh, I'm well aware that it won't run most legacy code especially well- I'm not looking to run Crysis on the thing. ;) But the ability to install Windows (or a standard Linux distro like RHEL) on the thing would make working with it a lot simpler. At the moment it has a specially hacked branch of the Linux kernel to run on it, but that's all.
Why would you want to run Windows on it? Windows is targeted at consumers and servers, not at HPC systems that have barely any need for an OS. Even so, Microsoft could easily create a Phi version of Windows, if there was enough demand. No need for Intel to bend over backwards to support Windows as-is.

That said, I guess the scalar FPU could be (ab)used to support everything up to AVX2, but that would cost a lot of microcode (for all of the 60+ cores). So I'm sure it's feasible one way or the other but the cost doesn't seem to be worth the gain.
The syntax for "offloading" tasks to the Phi is appallingly bad- try looking up some examples some time. It's #pragmas 4 lines long, and that sort of nonsense. Having it in a socket as the native CPU, driving the OS, makes it a much more straightforward target platform.
I doubt that's going to change anything. You still need fast sequential cores for dealing with command processing in a big supercomputer. You may even need them for JIT compilation. So you still offload tasks from those cores to Phi. The fact that it's an ugly process is just inherent to heterogeneous computing. Skylake seems to be your best chance of changing that around.

Unless you're in the market for exascale computing, I'd forget all about Xeon Phi.
 

NTMBK

Lifer
Nov 14, 2011
10,419
5,712
136
Why would you want to run Windows on it? Windows is targeted at consumers and servers, not at HPC systems that have barely any need for an OS. Even so, Microsoft could easily create a Phi version of Windows, if there was enough demand. No need for Intel to bend over backwards to support Windows as-is.

...

Unless you're in the market for exascale computing, I'd forget all about Xeon Phi.

Phi isn't just for exascale HPC. There are plenty, plenty of workstation applications which can benefit from a massively parallel coprocessor- see also, GPGPU. The ideal outcome for me (obviously not from this generation, but maybe from the next) would be to have a Phi in one socket and a Xeon in the other, with coherent memory and instruction matching. It would require an overhauled OS scheduler smart enough to know to load up the powerful cores on the Xeon first, then start loading up tasks which request 100s of threads onto the Phi cores as well, but its not beyond the realms of possibility.
 

sushiwarrior

Senior member
Mar 17, 2010
738
0
71
Phi isn't just for exascale HPC. There are plenty, plenty of workstation applications which can benefit from a massively parallel coprocessor- see also, GPGPU. The ideal outcome for me (obviously not from this generation, but maybe from the next) would be to have a Phi in one socket and a Xeon in the other, with coherent memory and instruction matching. It would require an overhauled OS scheduler smart enough to know to load up the powerful cores on the Xeon first, then start loading up tasks which request 100s of threads onto the Phi cores as well, but its not beyond the realms of possibility.

I think you misunderstand how Phi works now. It works just like you're describing - you run the OS on the Xeon, but when you have a task that needs doing you send it to Phi to get it done. Coherent memory isn't very impressive when you have much more capable GDDR5 dedicated memory on the add-in card. You can't seamlessly swap between a fully-capable Xeon and the extremely gimped Phi. It's just not built to run things like Windows, it's meant to take specialized and parallel instructions and get them done fast. Each core by itself is weak and outdated, but together they form a much faster machine.

Phi will never have the same instruction set as a full Xeon/desktop CPU. Doing so goes against the very principal of Phi.
 

NTMBK

Lifer
Nov 14, 2011
10,419
5,712
136
I think you misunderstand how Phi works now. It works just like you're describing - you run the OS on the Xeon, but when you have a task that needs doing you send it to Phi to get it done. Coherent memory isn't very impressive when you have much more capable GDDR5 dedicated memory on the add-in card. You can't seamlessly swap between a fully-capable Xeon and the extremely gimped Phi. It's just not built to run things like Windows, it's meant to take specialized and parallel instructions and get them done fast. Each core by itself is weak and outdated, but together they form a much faster machine.

Phi will never have the same instruction set as a full Xeon/desktop CPU. Doing so goes against the very principal of Phi.

I understand how it works right now, all right, I just don't especially like it. :) The Phi was Intel's way of selling people on "x86 everywhere", but given that a) it doesn't support the majority of widely used x86 extensions and b) it has a separate memory pool which requires explicit management, in practice it is no different from working with a GPU. (If anything, slightly worse- CUDA memory management is much nicer to write than the horrible stuff needed for the Phi.)

Yes, I know that each core is weak- it's a Pentium clocked at 1GHz, for goodness' sake. I don't expect the "main load" on it just to be an application written for typical x86 cores. It's got a massive number of cores with very wide vector units, of course the "heavy lifting" you are going to do on the card needs to be optimised for that. But I want it to just be able to support the older things for compatibility reasons- just to make application development that little bit less painful, and make the Phi an easier platform to work with.

Given the way that Phi is right now, I would frankly rather get my hands on the PS4 APU in a PC- 8GB GDDR5 coherent memory pool, with a handful of lightweight x86 cores to run the show and a nice big GPU to do the real work.