nVidia scientist on Larrabee

alyarb · Apr 14, 2009

then why is larrabee "going up against" openCL? does intel really expect people to not use opencl and write only for larrabee?

chizow · Apr 14, 2009

Originally posted by: aka1nas
Intel's on the member list for OpenCL, what else are they going to use it with?

Intel's on the member list for OpenCL, but most likely only as a stop-gap solution for Havok support to play along with AMD's "open standards" initiative.

Originally posted by: alyarb
then why is larrabee "going up against" openCL? does intel really expect people to not use opencl and write only for larrabee?

Yes, they do. Will Larrabee be compatible with OpenCL with a generic x86 compiler? Maybe, but I doubt it'll be able to fully take advantage of Larrabee's complicated multi-core + vector design and additional LRBni extensions.

From what I've read, one of the main reasons Microsoft isn't backing OpenCL is because its limited to C and not C++, which is something to look out for with regard to Intel's future support of it. Plus the term "open standard" is typically synonymous with "free", something Intel has never really been comfortable with.

Again, alluding to some of the links above, Intel is pushing CT, their own proprietary HPC C++ solution which will undoubtedly include a super duper charged x86 compiler for Larrabee worthy of the Intel brand and price tag. There's little doubt their compilers are the best in the business, but no one ever said they were free.

BenSkywalker · Apr 14, 2009

This is a nice summation of the branches in the timeline tree.

I would add that it needs to be taken within the context of the backdrop of Intel's pocketbook and ever-growing process technology advantage - meaning that even if Scenario 3 were the time-zero reality it would simply be a matter of time for Intel to iterate Larrabee thru die-shrinks and ISA/architecture improvements to evolve the scenario situation into becoming (2) and eventually (1).

Itanium serves as an example of such an evolution. As does the P4 -> Core -> i7 evolution. Persistence and money have a habit of doing this. Competitive advantage is rarely maintained by hope and morale alone.

The only way to argue that scenario (1) will not come to pass (eventually) is to make the requirement that Intel abandons their efforts in some capacity (always a possibility) and pulls engineers off future iterations of the product or we must make the questionable argument that NV engineers and management somehow are more crafty and clever in critically defining ways than Intel's engineers and management. (i.e. NV and Intel do an AMD/Intel A64/Prescott thing)

In a capex intensive industry such as semiconductors it is essentially a consequence of the math that tomorrow's victor will be whomever has the money (and desire) to invest today towards owning a marketspace tomorrow. There are exceptions to the rule (A64/Prescott) but in an evolving marketspace time smooths out all the exceptions and the end conclusions are just about as inevitable.

The USSR eventually went bankrupt and the west eventually won the cold the war, with the money and technology advantages that Intel has over NV and AMD it takes a rather peculiar set of boundary conditions for one to argue in good faith that the outcome isn't sort of a foregone conclusion.

It is interesting that your entire post ignores the history of the graphics market and how companies have managed to succeed and fail no matter the relative size of their competition. ATi, Matrox and 3Dfx all utterly dwarfed nVidia not all that long ago. As of right now, nVidia has close to enough liquid assets to pay cash outright for AMD in its entirety. While it is nice to assume that money alone will ultimately give you a major advantage, the real limitations are going to come to how much die space you are dealing with. Are you going to end up with a better processor spending $5 Trillion on R&D at 90nm or spending $100 Million on R&D at 32nm? It is true that having larger cash reserves will help you expand your transistor budget, but not enough to overcome staggering mistakes in architecture.

Intel spent billions on IA64, and while some in this thread seem to ignore the fact- IA64 was supposed to replace x86, Intel made this abundantly clear publicly- had x86 emulation up and running, spent billions and failed in no uncertain terms. Intel has also previously attempted to enter into the high end graphics market with the i740, while it did much better then Itanium, at best it managed a short life of being moderately competitive with mid tier offerings and resigned itself to integrated status fairly quickly. Not close to the huge financial failure that the Itanium was, but very far removed from being their entry into the booming graphics market.

Despite its enormous financial advantages, Intel fails with rather shocking frequency any time they step outside of the x86 market.

The very dangerous reality that Intel faces right now is that AMD and nVidia are both nigh fillrate complete for rasterizing needs, the amount of die space they will be able to dedicate to shader cores is going to accelerate significantly faster in relative terms then anyone's build process. Their cores are, by a staggering amount, more powerful then the most optimistic estimates for Larry on a transistor for transistor basis. This isn't going to change unless Intel completely abandons their notion that x86 and software rasterization is the way to go. It is a failed concept in every way from a market perspective.

Try and say what you will about the brilliance of Intel, simply point to their huge success outside of a vice grip on a commodity ISA market. It doesn't exist. The only thing history has proven to us about Intel more then they find a way to triumph in the x86 market is that they will find a way to fail when they step outside of it.

Intel's ignorance in the market they are attempting to enter is another major factor that shouldn't be overlooked. People point to the fact that Intel has hired people with extensive industry experience such as those from 3DLabs. AMD and nVidia both offer integrated graphics that are more powerful then anything 3DLabs ever created. Intel does not have talent, experience, or skill in the market they are trying to enter in any fashion that should convince anyone they are capable of taking the market over.

For all of their resources, for all of their money, for all of the money they have already spent on Larry, the only thing we know as of now is it is supposed to come out at some point in the future and that Intel themselves is already stating it will be a failure in both performance as a graphics card and a failure as a GPGPU platform compared to the alternatives. What makes the situation more dire for them is both AMD and nVidia are accelerating performance far beyond the curve indicated by build processes, and Intel's stated design philosophy will not ever allow them that same luxury.

I am not saying that Intel couldn't end up being a major force in the GPU market, but an honest analysis would indicate to me their best bet would be to spend the $6.5Billion to buy nVidia in order to do it(I limit that to nVidia as anti trust would prevent them from buying AMD for the far more wallet friendly $2.4Billion). Interesting side not about that current situation- AMD had far more resources then nVidia and was at the point where they were considering purchasing them not all that long ago. They had far more advanced build processes, more resources, their own fabs, all for what? They purchased ATi only to have nVidia dwarf not only ATi but all of AMD on the financial end of the spectrum in short order. Superior resources in this industry do no come close to assuring victory. They certainly help, but it is very, very far removed from a certain thing.

Wreckage · Apr 14, 2009

Larrabee may not even ship this year (or next). By then not only will DX11 cards be out but CUDA will have a head start measuring several years. Plus over 100 million CUDA capable machines.

taltamir · Apr 14, 2009

Originally posted by: Wreckage
Larrabee may not even ship this year (or next). By then not only will DX11 cards be out but CUDA will have a head start measuring several years. Plus over 100 million CUDA capable machines.

and each and every cuda capable machine is going to be DX11 and openCL capable, plus every AMD machine will also support DX11 and openCL, so counting only nvidias 100 million DX10+ hardware is an understatement of the enemety of the competition intel faces

Benskywalker, this is an EXCELLENT analysis all around, I would say you are right on every one of your points...

I give intel SOME benefit of the doubt by assuming they are making the colossal mistake of wasting 40+% of the die on x86 decoder because they are: 1. underestimating their competition. 2. willing to make a "slightly inferior" product to what they could otherwise make to get x86 to control that market space, thus ensuring themselves a legal monopoly and outright destroying the competition via patent laws. Heck they are getting VERY aggressive with patent suits right now.

While the execs might have sold it as a "low risk gable with high rewards", the reality of the matter is that it practically guarantees their failure in the graphic space.

SickBeast · Apr 14, 2009

Ben is slowly convincing me.

I guess if Larrabee has to take a 40% die hit due to the x86 registers, then take another hit due to emulation, there's not much they can do to compete. They may wind up less than half as efficient per transistor count (and probably per watt as well).

I still say that it will have a use for some people; it just probably won't be for gaming.

The PS4 rumours are intriguing if anything.

SickBeast · Apr 14, 2009

Originally posted by: aka1nas

Originally posted by: SickBeast

The problem is that CUDA requires a lot of support from the industry in order to implement it. If you look at the control that intel and microsoft have had of things over the past 20+ years, I don't think NV has much of a chance of their standard taking over.

I'm not saying this is good; I'm simply being realistic.

Click to expand...

I would imagine that Larrabee will be going up against OpenCL more than CUDA by the time that it is available. CUDA's market penetration is still low enough that many of those developers will likely just rewrite with OpenCL to pick up the other hardware platforms (I'm mostly thinking about HPC here).

There really doesn't seem to be a lot of reason to hand-code an application just for Larrabee in a world where OpenCL has already been adopted.

OpenCL is still enough in its infancy to die off. Larrabee will probably be rushed out for this reason. I'm certain that intel would love to kill that standard more than anything else right now.

When you think about it, 100 million CUDA capable graphics cards is not all that much considering that that represents a small percentage of the overall market. Only once all graphics cards sold have OpenCL support will we see it really take of as a standard, with most games supporting it. We may even need to see IGPs with it before it can take off. It gets tricky.

chizow · Apr 14, 2009

Larrabee Wafer Pictured in Detail - Pat Gelsinger at IDF

If Dally and Ben's comments weren't enough to convince about wasted die space, that's Larrabee at 45nm with over 600mm^2 per die with an estimated 85 dice per 300mm wafer. To put that into perspective, the original 65nm GT200 was ~570mm^2.

With a die size that large at 45nm, that pretty much rules out additional transistors on this process, meaning additional performance will most likely only be possible with a die shrink and 32nm at the earliest.

alyarb · Apr 15, 2009

oh god, the article is saying 650mm^2

SickBeast · Apr 15, 2009

The thing won't be out for at least 1.5 years. At 32nm the chip won't be nearly so big. If NV can have somethign manufactured that is 570mm^2, then you damn well know that intel can do considerably better than that if they have to.

TBH I really wish that intel would release something giving us an idea as to how this chip will perform in games.

chizow · Apr 15, 2009

Originally posted by: SickBeast
The thing won't be out for at least 1.5 years. At 32nm the chip won't be nearly so big. If NV can have somethign manufactured that is 570mm^2, then you damn well know that intel can do considerably better than that if they have to.

TBH I really wish that intel would release something giving us an idea as to how this chip will perform in games.

You completely missed the point. The point is that Nvidia's 65nm behemoth already weighed in at ~600mm^2 with 1.4 million transistors. Larrabee is already larger than that on a full node shrink to 45nm with who knows how many more transistors.... And from early indications, its still not competitive with current gen parts from Nvidia and AMD. Now do you see where Dally's "x86 tax" and design comments come from?

Also, Larrabee is scheduled to launch late this year or early next, so I'm not sure what your 1.5 year comments are referring to. Actually I'm pretty sure Larrabee was already delayed once with samples originally scheduled for Q4 2008 and pushed back to around now. While its true Intel has typically enjoyed a considerable fab and process edge compared to the competition, its also becoming increasingly evident those advantages aren't going to be able to overcome and compensate for die space dedicated to x86.

SickBeast · Apr 15, 2009

Originally posted by: chizow

Originally posted by: SickBeast
The thing won't be out for at least 1.5 years. At 32nm the chip won't be nearly so big. If NV can have somethign manufactured that is 570mm^2, then you damn well know that intel can do considerably better than that if they have to.

TBH I really wish that intel would release something giving us an idea as to how this chip will perform in games.

Click to expand...

You completely missed the point. The point is that Nvidia's 65nm behemoth already weighed in at ~600mm^2 with 1.4 million transistors. Larrabee is already larger than that on a full node shrink to 45nm with who knows how many more transistors.... And from early indications, its still not competitive with current gen parts from Nvidia and AMD. Now do you see where Dally's "x86 tax" and design comments come from?

Also, Larrabee is scheduled to launch late this year or early next, so I'm not sure what your 1.5 year comments are referring to. Actually I'm pretty sure Larrabee was already delayed once with samples originally scheduled for Q4 2008 and pushed back to around now. While its true Intel has typically enjoyed a considerable fab and process edge compared to the competition, its also becoming increasingly evident those advantages aren't going to be able to overcome and compensate for die space dedicated to x86.

My bad on the 1.5 year comment. It's late here.

I agree that intel's manufacturing can't make up for a 50% shortfall in efficiency. They're screwed.

I'm not a big fan of emulation personally. I'm not sure what intel is trying to achieve with Larrabee.

Like I said, I think it will be a rendering beast. People who do 3D Max for a living are going to love Larrabee to no end.

I think we'll see some impressive tech demos of what can be done if a game is coded specifically for Larrabee. The problem is that I doubt any developer would be that stupid. Nemesis1 pointed out a game but I find him difficult to understand and he's often neither here nor there. He told me I was cupping one time. :Q

ShawnD1 · Apr 15, 2009

Originally posted by: chizow
You completely missed the point. The point is that Nvidia's 65nm behemoth already weighed in at ~600mm^2 with 1.4 million transistors. Larrabee is already larger than that on a full node shrink to 45nm with who knows how many more transistors.... And from early indications, its still not competitive with current gen parts from Nvidia and AMD. Now do you see where Dally's "x86 tax" and design comments come from?

That does sound crazy as hell. If graphics cards can use any design they want and Intel is choosing to use x86, they probably have a reason. They've made bigger and better processors before, so this x86 thing is probably tied to some kind of scheme they're cooking up. At least I hope it is.

thilanliyan · Apr 15, 2009

Originally posted by: chizow
With a die size that large at 45nm, that pretty much rules out additional transistors on this process, meaning additional performance will most likely only be possible with a die shrink and 32nm at the earliest.

I don't remember where but I read that the pic of that wafer is actually test/prototype/whatever you want to call it Larrabee on 65nm while the actual production would be on 45nm.

Keysplayr · Apr 15, 2009

Originally posted by: thilan29

Originally posted by: chizow
With a die size that large at 45nm, that pretty much rules out additional transistors on this process, meaning additional performance will most likely only be possible with a die shrink and 32nm at the earliest.

Click to expand...

I don't remember where but I read that the pic of that wafer is actually test/prototype/whatever you want to call it Larrabee on 65nm while the actual production would be on 45nm.

Why would they create a prototype on 65nm? 45nm isn't just a process shrink from 65nm and needs to be reworked for 45nm, and it would seem a waste of time to create a prototype on 65nm especially when Intel has had 45nm for quite a while now and heading toward 32nm..

While it certainly is possible for that wafer to be 65nm, I don't think it's plausible at this time when 45nm has been out so long and 32nm not too far off.

jiffylube1024 · Apr 15, 2009

I don't know too much about Larrabee myself, but to use an x86 CPU to use GPU tasks is a huge waste of die space, as others have said.

Still, x86 is Intel's bread and butter, and if anyone can keep this from being a colossal failure, it's Intel.

taltamir · Apr 15, 2009

Originally posted by: chizow
Larrabee Wafer Pictured in Detail - Pat Gelsinger at IDF

If Dally and Ben's comments weren't enough to convince about wasted die space, that's Larrabee at 45nm with over 600mm^2 per die with an estimated 85 dice per 300mm wafer. To put that into perspective, the original 65nm GT200 was ~570mm^2.

With a die size that large at 45nm, that pretty much rules out additional transistors on this process, meaning additional performance will most likely only be possible with a die shrink and 32nm at the earliest.

WTH! after all those slides showing how dozens of die scale almost linearly, after getting hydra, all that, and they are making a monolith? and one significantly larger than anything ever attempted by intel?

Although, maybe what we "know" about larabee is misdirection and its a more traditional GPU (in which case it needs to have a huge die), and intel is the worlds formost expert on manufacturing tech...

Idontcare · Apr 15, 2009

Originally posted by: taltamir
WTH! after all those slides showing how dozens of die scale almost linearly, after getting hydra, all that, and they are making a monolith? and one significantly larger than anything ever attempted by intel?

Intel has oodles of experience yielding >600mm^2 chips...Itanium. Beckton and Dunnington both ring in at nearly that weight as well.

Presumably die harvesting will be involved, no different for GT200 or Cell.

Originally posted by: taltamir
Although, maybe what we "know" about larabee is misdirection and its a more traditional GPU (in which case it needs to have a huge die), and intel is the worlds formost expert on manufacturing tech...

I'm still trying to figure out how so many experts in this thread came to know the launch clockspeeds for Larrabee...a necessary item for making any prognostication regarding performance envelope.

chizow · Apr 15, 2009

Originally posted by: Idontcare
Intel has oodles of experience yielding >600mm^2 chips...Itanium. Beckton and Dunnington both ring in at nearly that weight as well.

Presumably die harvesting will be involved, no different for GT200 or Cell.

They're very different in the sense a CPU die is often comprised mainly of cache, which is the simplest transistor and always the first used to validate a new process node. That's very different from GPUs which are already dedicating upwards of 50% to execution units, a ratio that is only growing with every new iteration.

Originally posted by: thilan29
I don't remember where but I read that the pic of that wafer is actually test/prototype/whatever you want to call it Larrabee on 65nm while the actual production would be on 45nm.

Originally posted by: Idontcare
I'm still trying to figure out how so many experts in this thread came to know the launch clockspeeds for Larrabee...a necessary item for making any prognostication regarding performance envelope.

Kind of related, but these assumptions are based on what's already been published about Larrabee. You can see in the AT review they made some guesses on clockspeed, performance (in TFLOPs), die size, process, number of cores. The big question they were unsure of was TDP, which would certainly limit clockspeeds. You can see they're pretty accurate too, and actually take into account the 65nm to 45nm and arriving at 64 cores, a number that also pops up on those IDF slides as the high-end.

Performance estimates and theoreticals @ AT
Die size estimates at AT

OCGuy · Apr 15, 2009

Speaking of IGP....

http://www.xtremesystems.org/f...howthread.php?t=222931

alyarb · Apr 15, 2009

if anand is right, 160 flops per clock doesn't sound to good. at 2 ghz that 10-core is only doing 320 gflop. pretty good for a CPU but a GPU? isn't the production version supposed to have 32 cores? or is it 10? are gamers supposed to buy a 320 gflop card?

konakona · Apr 15, 2009

might be a bit of an irrelevant tidbit, but sony is probably not the best example for your own sake. GS in PS2, RSX in PS3 were underperforming relative to competition at their time and certainly not the best choices at the time.

thilanliyan · Apr 15, 2009

Originally posted by: Keysplayr
Why would they create a prototype on 65nm? 45nm isn't just a process shrink from 65nm and needs to be reworked for 45nm, and it would seem a waste of time to create a prototype on 65nm especially when Intel has had 45nm for quite a while now and heading toward 32nm..

While it certainly is possible for that wafer to be 65nm, I don't think it's plausible at this time when 45nm has been out so long and 32nm not too far off.

I don't know either. I'm pretty sure I read it somewhere (can't remember where anymore) so take it with a grain of salt.

lopri · Apr 15, 2009

Originally posted by: OCguy
Speaking of IGP....

http://www.xtremesystems.org/f...howthread.php?t=222931

:Q

Nemesis 1 · Apr 15, 2009

Originally posted by: taltamir

Originally posted by: BenSkywalker

Any video encoding app would benefit without having to be patched or re-coded, for example.

Click to expand...

That would ignore what a non x86 based alternative with comparable layout could do. Some here may demand ignorance to enter into a discussion, but in all seriousness x86 is about the poorest architecture you could imagine for this type of architecture. Decode hardware when tiny die space per core is the essence of your design goal is rather foolish. What makes this worse, far worse, is that the applications will still require a recompile in order to run on Larrabee, it isn't an OoO architecture- default x86 code would roll over and die running on it(wouldn't be surprising to see a normal processor be faster on anything with decent amounts of branching).

With several hundred thousand transistors per core wasted on decode hardware, more trasnsistors utilized to have full I/O functionality given to each core, a memory setup that is considerably more complex then any of the other vector style processor choices available Larrabee is making an awful lot of compromises to potential performance to be more Intel like then it needs to be.

Everyone seems to be taking the stance that Larrabee must have a lot going for it because of how much Intel is putting into it. Itanium anyone? Everyone with so much as an extremely small dose of understanding knew that Itanium was going to be a huge failure in the timeframe it hit. Sadly, a VLIW setup for something like Larrabee would end up being a much better option then where they are headed.

I guess, the best way to think of it is that Intel clearly sees a major movement as does everyone else in computing power. The problem is, Intel wants to take as much lousy outdated broken down crap with them as they can. We already have x86 as our main CPUs to handle that garbage, why do we need more of the same wasted die space on our GPUs? To make it so that lousy existing x86 code that isn't well suited for extreme levels of parallelization can be recompiled in an easier fashion? So let's prop up our outdated poorly structured code base for a short term gain and hold back everything else in the long term? Just doesn't make sense to me.

Click to expand...

first accurate post of the thread... larabee is set to waste over 40% of its total space on redundant x86 decode hardware (one for each core), and for what? no SSE, no out of order... NOTHING is going to run on it without a serious recompile and recode... so why bother with it in the first place? its wasting space on a gimmick.
the way i see it, intel is banking on taking a loss (by wasting nearly half the die on NOTHING) for the chance to get x86 to become the standard, if that happens then they are granted legal monopoly status and no one may compete with them. It seems as clear as day that this could be their only course of action, intel engineers are not stupid.

I think this is why the professor in question is joining the fight... nvidia is the one company that stands a chance at braking the x86 stranglehold and potentially getting us heterogenous computing. although i wouldn't be surprised if they would just opt to displace intel as the only legal monopoly backed by stupid misapplied patent laws.

Your first Bold.
So true X86 must be recompiled. BUT ALL SSE2 can be recompiled by simply adding pretext of (vec) thats easy. You would also do well to find out what kind of recompile this is . Not all compiles are EQUAL.

Your 2 bold . We all know it wasn't Intel who kept us in X86 hell now was IT?

X86 on Larrabee has to be recompiled. What does that mean? I heard some talking here like they know what intel has done on the X86 side of things. You said 40% of what X86 larrabee die is a waste. I will take issue with that. Since compiler has been brought up and it dam well better be . That compiler is native C/C++ . Now I really don't know what that means other than larrabee runs on a software layer natively . Now Thats what I don't understand . How can it run on a software layer natively . But thats what there saying . Or I just not getting it. But the compiler Intel has can do some interesting things as were all going to find out. Intel only said Larrabee was x86 cpu . It never said how that those sse instructions would run . Only that it will. WITH a recompile Since SSE2 is so common the simple port of pretect of (vec) makes it a 1 time run and play . recompile.

But were stuck here on x86 . When its the vertex unit and the the ring bus,compiler.and more than anthing else the scatter gather and the use of a memory mask. Open CL Yet were talking about X86 in which none here knows the x86 hardware involved in this UNIT.

Screw larrabee . I am excited for the software render tech . Sooner the better. I know why MS has been bad mouthing Apple . I told you guys you be happy with the ATI 4000.s . This has nothing to do with Larrabee yet. But wait tile you see Nehalem on snow. LOL!

nVidia scientist on Larrabee

Platinum Member

Diamond Member

Diamond Member

Banned

Lifer

Lifer

Lifer

Diamond Member

Platinum Member

Lifer

Diamond Member

Lifer

Lifer

Lifer

Elite Member

Diamond Member

Lifer

Elite Member

Diamond Member

Lifer

Platinum Member

Diamond Member

Lifer

Elite Member

Lifer