Originally posted by: chizow
Originally posted by: SickBeast
He points out that Larabee's x86 cores are wasteful in terms of die size. This may hold true for graphics performance, but he fails to mention the benefit of having that many x86 cores in your computer. Any video encoding app would benefit without having to be patched or re-coded, for example.
For most people, it's better to have more general-processing power than it is to have a ton of graphics power. If you look at most laptop computers, it illustrates this quite clearly.
Originally posted by: Extelleron
What Larrabee will do is redefine the GPGPU. I think that it will lead to widespread adoption of GPGPU thanks to the x86 architecture which will enable developers to support it without any significant change to their programs. The kind of power that is going to be available with Larrabee - likely 2TFLOPs+ peak FP, 32 cores / 128 threads..... is going to be very impressive.
You guys are placing a lot of faith in Intel's Larrabee compiler to effectively make single or few threaded applications run efficiently on Larrabee's vector execution units without any additional help, especially given many of these apps don't scale particularly well on existing x86 architectures. You really don't need to look any further than a current example of scalar vs. vector design when comparing Nvidia vs. ATI stream processing units, where ATI's 5 vector design depends heavily on optimization for scaling and efficiency.
Larrabee's design will be even more dependent on application or compiler optimizations with 16 vector execution units per core. I'm also not sure where you get the impression current apps will automatically accelerate on Larrabee without being recompiled or without any application optimization, just because they share the same base x86 ISA. I think the main concern about Larrabee is that not only do you have pottentially less efficient vector units per Larrabee core with an x16 design, you have all this additional redundant x86 overhead before you can even access those execution units.
- AT's Larrabee Preview by Anand and Derek
NVIDIA's SPs work on a single operation, AMD's can work on five, and Larrabee's vector unit can work on sixteen. NVIDIA has a couple hundred of these SPs in its high end GPUs, AMD has 160 and Intel is expected to have anywhere from 16 - 32 of these cores in Larrabee. If NVIDIA is on the tons-of-simple-hardware end of the spectrum, Intel is on the exact opposite end of the scale.
We've already shown that AMD's architecture requires a lot of help from the compiler to properly schedule and maximize the utilization of its execution resources within one of its 5-wide SPs, with Larrabee the importance of the compiler is tremendous. Luckily for Larrabee, some of the best (if not the best) compilers are made by Intel. If anyone could get away with this sort of an architecture, it's Intel.
At the same time, while we don't have a full understanding of the details yet, we get the idea that Larrabee's vector unit is sort of a chameleon. From the information we have, these vector units could exectue atomic 16-wide ops for a single thread of a running program and can handle register swizzling across all 16 exectution units. This implies something very AMD like and wide. But it also looks like each of the 16 vector execution units, using the mask registers can branch independently (looking very much more like NVIDIA's solution).
We've already seen how AMD and NVIDIA architectural differences show distinct advantages and disadvantages against eachother in different games. If Intel is able to adapt the way the vector unit is used to suit specific situations, they could have something huge on their hands. Again, we don't have enough detail to tell what's going to happen, but things do look very interesting.
Here's a good indication that existing code compiled for x86 isn't going to leverage Larrabee's additional vector unit functionality and additional extensions:
Intel outlines CT parallel programming language at IDF. I'm sure it comes with its own highly specialized x86 compiler needed to fully extract performance out of those vector units and additional LRBni extensions.
The fact they're now pushing their own parallel computing language certainly flies in the face of some comments they made about GPGPU and
CUDA becoming a footnote in the annals of history. I'd say Larrabee's existence alone would lend creedance to some of Nvidia's predictions about the future of computing, but Intel's focus on using Larrabee for highly parallel computing rather than their existing multicore desktop architectures certainly solidifies those claims.
At this point it seems the inclusion of x86 on Larrabee was more of an attempt by Intel to ensure x86 doesn't become "a footnote in the annals of computing history." Because we all know how much Intel covets and protects that x86 license don't we?