BSN: Intel Larrabee finally hits 1TFLOPS - 2.7x faster than nVidia GT200!

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Intel Larrabee finally hits 1TFLOPS - 2.7x faster than nVidia GT200!

The keynote continued while the engineers scrambled at the back to try to beat the 1TFLOPS barrier. A couple of minutes before the end of the keynote, Justin added the infamous "And one more thing…" Initial overclocked performance was 913 GFLOPS, moved slowly past 919 GLOPS, bounced up to 997 GFLOPS and ultimately passed the 1TFLOPS barrier with 1006 GFLOPS. Now, we can debate the numbers all we want, but the fact of the matter is that nVidia Tesla C1060 delivers only 370 GFLOPS in an identical SGEMM 4Kx4K calculation. Thus, Larrabee today comes at 2.7x math performance of GT200 chip.

http://www.brightsideofnews.com/new...-1tflops---27x-faster-than-nvidia-gt200!.aspx

Its actually an interesting read, worth hitting the link IMO.

What they don't tell us is in the article is whether or not the 1Tflop result was accomplished with the 2006 specs they quote in their intro:

Back in 2006, when we first got the first details about Larrabee, the performance goal was "1TFLOPS@ 16 cores, 2.0 GHz clock, 150W TDP".
 

MrK6

Diamond Member
Aug 9, 2004
4,458
4
81
Thanks for the info :). I don't know if this is too much of a newbie question, but what exactly does this bring to the table? Are the 1GFLOPS computed by Larrabee any more useful that that of a NVIDIA GPU (as in, is it a question of x86 vs. CUDA)? Isn't Fermi supposed to have ~1TFLOPS output? I know the 5870 has ~2.7TFLOPS output, but there's not much coded to use it, iirc.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Yeah you have to make the distinguishment between "theoretical peak GFLOPS" and "actual GFLOPS".

The 2.7Tflop number for AMD is their theoretical max based on architectural details, but depending on the inefficiencies of the architecture you and I might only be able to extract 1TFlop out the chip.

The 1TFlop number being reported by BSN is actual Flops extracted from Larrabee, as is the 370 GFlop number reported for the GT200.

And best I can tell these are single-precision flops, not double-precision.

Fermi is supposed to deliver 1TFlop actual (peak theoretical will be higher, but who cares).

For the HPC crowd that wants to use GPGPU for their applications these are all good numbers, we don't know the configuration of Larrabee though...is that 1TFlops for 16 cores @ 2GHz and 150W or was it for their 48-core model OC'ed to 3GHz and 400W?

I'm personally interested in Larrabee for its HPC/GPGPU attributes. Compiling programs for Larrabee's x86-based cores has got to be a lot more straightforward than compiling for AMD's APUs or Nvidia's Fermi if Intel really does go forward with their shared virtual memory plans as outlined in this Intel marketing slide:

kaigai-02.jpg
 

Genx87

Lifer
Apr 8, 2002
41,095
513
126
I thought Nvidias compiler bolt ons were goign to alleviate some of the pain for programming on Fermi?
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
I thought Nvidias compiler bolt ons were goign to alleviate some of the pain for programming on Fermi?

Yes, absolutely they will alleviate some of the pain. And in the absence of a superior option the Fermi option will be the preferred path.

I'm merely musing on the prospects of Larrabee becoming (1) an option, and (2) potentially a superior option.

I'm not about to rule out Fermi/Nvidia though, not by a long shot. Nvidia needs GPGPU to work out for them, Intel is more in a "nice to have" position.

Kind of like OCZ vs. Intel in the SSD marketspace, or IBM/SUN vs. Intel in the big-iron marketspace.

And who knows, by the time Intel gets around to letting Larrabee be bought thru Newegg we could all very well be looking at Llano chips with APUs and Fermi 2 (NF200?).

But one cannot argue against the ease and simplicity that will come with Larrabee if Intel incorporates it into a shared virtual memory model and its own compilers can assist in creating binaries with codepaths that detect Larrabee on the fly. That barrier to entry could not be lower.
 

yh125d

Diamond Member
Dec 23, 2006
6,907
0
76
Looks good, but I am doubtful it was done with "real world" configuration, and am still very skeptical abou the whole ordeal
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Yes, absolutely they will alleviate some of the pain. And in the absence of a superior option the Fermi option will be the preferred path.

I'm merely musing on the prospects of Larrabee becoming (1) an option, and (2) potentially a superior option.

I'm not about to rule out Fermi/Nvidia though, not by a long shot. Nvidia needs GPGPU to work out for them, Intel is more in a "nice to have" position.

Kind of like OCZ vs. Intel in the SSD marketspace, or IBM/SUN vs. Intel in the big-iron marketspace.

And who knows, by the time Intel gets around to letting Larrabee be bought thru Newegg we could all very well be looking at Llano chips with APUs and Fermi 2 (NF200?).

But one cannot argue against the ease and simplicity that will come with Larrabee if Intel incorporates it into a shared virtual memory model and its own compilers can assist in creating binaries with codepaths that detect Larrabee on the fly. That barrier to entry could not be lower.

Yes it turning out almost exactly the way we discussed it a year or 2 back and earlier for me . its going to be a hell of a 2010
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
I just don't find the news that Intel has managed to catch up to a 2 year old architecture all that interesting, especially considering that Larrabee's release is not imminent.
 

dflynchimp

Senior member
Apr 11, 2007
468
0
71
I just don't find the news that Intel has managed to catch up to a 2 year old architecture all that interesting, especially considering that Larrabee's release is not imminent.

it's an impressive feat in that:

1. This is a new approach to graphics computing.
2. Their graphics department has never turned out anything worth mentioning up until now (all their IGP's amounting to silicocrap)
3. The rate at which they're catching up to ATI/Nvidia, the two specialized gpu giants in the industry.
 

MrK6

Diamond Member
Aug 9, 2004
4,458
4
81
Yeah you have to make the distinguishment between "theoretical peak GFLOPS" and "actual GFLOPS".

The 2.7Tflop number for AMD is their theoretical max based on architectural details, but depending on the inefficiencies of the architecture you and I might only be able to extract 1TFlop out the chip.

The 1TFlop number being reported by BSN is actual Flops extracted from Larrabee, as is the 370 GFlop number reported for the GT200.

And best I can tell these are single-precision flops, not double-precision.

Fermi is supposed to deliver 1TFlop actual (peak theoretical will be higher, but who cares).

For the HPC crowd that wants to use GPGPU for their applications these are all good numbers, we don't know the configuration of Larrabee though...is that 1TFlops for 16 cores @ 2GHz and 150W or was it for their 48-core model OC'ed to 3GHz and 400W?

I'm personally interested in Larrabee for its HPC/GPGPU attributes. Compiling programs for Larrabee's x86-based cores has got to be a lot more straightforward than compiling for AMD's APUs or Nvidia's Fermi if Intel really does go forward with their shared virtual memory plans as outlined in this Intel marketing slide:

kaigai-02.jpg
Ah, so it's actual, I can definitely see the merits in that :p. Seeing how much of that makes it to market will be interesting but wow, this does put quite the twist on computing, never mind opening a ton of doors.
it's an impressive feat in that:

1. This is a new approach to graphics computing.
2. Their graphics department has never turned out anything worth mentioning up until now (all their IGP's amounting to silicocrap)
3. The rate at which they're catching up to ATI/Nvidia, the two specialized gpu giants in the industry.
Seriously, #3 is what gets me too. Talk about an unstoppable machine.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Last edited:

Borealis7

Platinum Member
Oct 19, 2006
2,914
205
106
*EDITED OUT*
forum is acting up...edit caused a double post?
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
This is the same thing I reported back in this thread: http://forums.anandtech.com/showthread.php?t=2025428

Apparently the 80 cores was a typo and its 32.

It looks like Intel succeeded in demoing Larrabee before the much hyped(and faked :) ) Fermi!

Wait so BSN is only just now getting around to reporting the stuff you already posted about weeks ago!?

I assumed it was additional public benchmarking, not rehash of the same old benchmark run.

My bad :oops:

Do you know if this 48-core "Single-chip Cloud Computer (SCC)" is a Larrabee derivative or if it is a wholely different design/architecture?

http://news.cnet.com/8301-1001_3-10407818-92.html
 

BenSkywalker

Diamond Member
Oct 9, 1999
9,140
67
91
Ouch, sounds like Larrabee is far worse off in relative terms then i740 was.

Remember when talking about Larrabee you are talking about the entire chips' performance, you aren't talking about just the shader hardware. Intel should be around 50x-100x higher then 1TFLOP if they are honestly expecting to ray trace current games(and no, that isn't a joke). Intel doesn't need their entire chip to be able to compete with one section of everyone elses, they need to be faster, much faster then everyone else as they are emulating what ATi and nV do in hardware. If this is an honest analysis of where Larrabee is today, they should probably junk the entire project at least as it stands from a GPU perspective.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Other than that embarrassment of a youtube video showing some chubby german benching 20lbs followed by the weakest of ray-tracing demos what other "gaming environment" demos has Intel done with Larrabee? Pretty much all the PR has been tilted towards HPC GPGPU stuff from what I've noticed so far. That says it all (re: larrabee and gaming) if you ask me. Still though, if you can sell me an x86 ISA-compatible processor with compiler support that hits 1TFlops for <$500 you will get my attention.
 

v8envy

Platinum Member
Sep 7, 2002
2,720
0
0
TBH, nv is doing the same thing. They're excited about general processing co-processor boards, not GPUs as we know them. Some of the analysis even guestimates Fermi may not have much more gaming power than a 285, it's all about GPGPU.

The only curveball with designing for scientific computing first and your main market second is volume is required to hit even the $3000-ish pricepoints profitably. Yes, the gross margins on workstation cards are amazing. But would they remain that way if you ordered say 100 wafers from TSMC instead of 10,000?

Larrabee could be very interesting because of the shared memory pool + x86 ISA. As mentioned before. OS runtime libraries could be optimized to run existing code on the GPU with absolutely no work required by the end user. No porting, no recompiling. Current code would just WORK and be 'GPU' accelerated when that makes sense. (e.g., sort() is a great candidate for a library call to run on 'GPU'. Suddenly your existing database and spreadsheet run faster!).

The same approach would be harder with Fermi -- existing x86 instructions would have to be converted to whatever Fermi likes on the fly. It's been done, but is definitely harder.
 

MrK6

Diamond Member
Aug 9, 2004
4,458
4
81
I hope I'm not dumbing it down too much (simply due to my lack of experience and knowledge in this field), but if Larrabee is that far behind in ray-tracing capabilities, is Intel just trying to sell mini-super computers (that hopefully scale very well)? What exactly would be the benefit to consumers? Encoding/compiling boosts? Or are they trying to go for broke and capture the scientific sector as well (are there any reports of double-precision performance?).