nVidia scientist on Larrabee

Nemesis 1 · Apr 15, 2009

Originally posted by: alyarb
then why is larrabee "going up against" openCL? does intel really expect people to not use opencl and write only for larrabee?

Open Cl plays perfectly with intel larrabee native. Larrabee runs on a software layer.
Its compiler is a C/C++ . On the front end we now its larrabe native to the compilerC/C++ . This includes all C languages . Fact is Intel says they can run all code even VLIW. But What Larrabe native is we kind out soon. Programms can right directly to the cores in Larrabee LBin<- spelling . or they can go C/C++ directly to the core . That means any c language. Intel can even do Cuda . But NV might not like that . Intel can do Brook also . Basicly Intel can do whatever they want better . Why . Because Intel Has the Compilers. Hardly any talk about compilers . After larrabe working with Nehalem Compilers will be the BIG talk.

Nemesis 1 · Apr 15, 2009

Originally posted by: SickBeast
Ben is slowly convincing me.

I guess if Larrabee has to take a 40% die hit due to the x86 registers, then take another hit due to emulation, there's not much they can do to compete. They may wind up less than half as efficient per transistor count (and probably per watt as well).

I still say that it will have a use for some people; it just probably won't be for gaming.

The PS4 rumours are intriguing if anything.

It doesn't . Don't look at Larrabe X86 like its a beast you know. Because its not. There recompiling for a reason . Don't forget this isn't a GP. This is for a GPUCPU working together. The rest seems to becoming with haswell.

Just remember this is a Modified baby . Lean and mean .

OK somebody mentioned X86 bagage. I assume Intels engineers are retarded and didn't take this in to consideration. You know better than that.

FOR you. Its hard when people try to make you think wrong because they are.

We know. Because its been said in this thread many times that SSE has to be recompiled to run on larrabe. So the frontend of larrabbee the compiler runs code for a vertex unit. Even the X86 code that was recompiled. This is Correct so far right.

Well if the Larrabe core frontend did that . Were all this X86 bagage. Explain what X86 decoders are good for if the code has been recompiled.

In compilers, the front-end translates a computer programming source language into an intermediate representation, and the back-end works with the internal representation to produce code in a computer output language. The back-end usually optimizes to produce code that runs faster. The front-end/back-end distinction can separate the parser section that deals with source code and the back-end that does code generation and optimization; some designs (such as GCC) offer the choice between multiple front-ends (parsing different source languages) and/or multiple back-ends (generating code for different target processors).

Nemesis 1 · Apr 15, 2009

Here's some stuff from a thread at extreme . we were speculating. At the time we new nothing about larrabee. But this was written based on tech INTEL has Available. This was written in 2006 by myself. Its nothing but speculating . About INTEL CPUs

But look at what we know about larrabee. Looks like I was doing fairly good VLIW seems out of place or is it .

However, there was one company which took a more radical approach and while its processor wasn?t exactly blazing fast it was faster than those using the stripped back approach, what?s more it didn?t include the x86 instruction decoder. That company was Transmeta and its line of processors weren?t x86 at all, they were VLIW (Very Long Instruction Word) processors which used "code morphing" software to translate the x86 instructions into their own VLIW instruction set.

Transmeta, however, made mistakes. During execution, its code morphing software would have to keep jumping in to translate the x86 instructions into their VLIW instruction set. The translation code had to be loaded into the CPU from memory and this took up considerable processor time lowering the CPU?s potential performance. It could have solved this with additional cache or even a second core but keeping costs down was evidently more important. The important thing is Transmeta proved it could be done, the technique just needs perfecting.

Intel on the other hand can and do build multicore processors and have no hesitation in throwing on huge dollops of cache. The Itanium line, also VLIW, includes processors with a whopping 9MB of cache. Intel can solve the performance problems Transmeta had because this new processor is designed to have multiple cores and while it may not have 9MB it certainly will have several megabytes of cache.

Most interestingly though is the E2K compiler technology which allows it to run X86 software. This is exactly the sort of technology Intel need and since last year they have had access to it and employ many of it?s designers.

You can of course expect all these cores to support 64 bit processing and SSE3, you can also expect there to be lots of them. Intel?s current Dothan cores are already tiny but VLIW cores without out of order execution or the large, complex, x86 decoders leave a very small, very low power core. Intel will be able to make processors stuffed to the gills with cores like this.

Intel will now be free to do as it pleases with X86 decoding done in software Intel can change the hardware at will. If the processor is weak in a specific area the next generation can be modified without worrying about backwards compatibility. Apart from the speedup nobody will notice the difference. It could even use different types of cores on the same chip for different types of problems.

The New Architecture
To reduce power you need to reduce the number of transistors, especially ones which don?t provide a large performance boost. Switching to VLIW means they can immediately cut out the hefty X86 decoders.

Out of order hardware will go with it as they are huge, consumes masses of power and in VLIW designs are completely unnecessary. The branch predictors may also go on a diet or even get removed completely as the Elbrus compiler can handle even complex branches.

For those who are unaware. VLIW work really really good with Vertex units . Almost all use VLIW. Maybe intels differant. Maybe intells not differant. Intel only said it was an in order x86 core. You decide what that means. Does it have the out of order hardware NO.

Does it have the X86 decoders ???? Did it get rid of branc preditions and using compiler for that . The Compilers are the real TALK here

taltamir · Apr 15, 2009

Originally posted by: alyarb
if anand is right, 160 flops per clock doesn't sound to good. at 2 ghz that 10-core is only doing 320 gflop. pretty good for a CPU but a GPU? isn't the production version supposed to have 32 cores? or is it 10? are gamers supposed to buy a 320 gflop card?

320 gflops < nvidia G92 or ati HD2800 series.
The 9800GTX is 432 GFlops
The GTX280 is almost 1tflop and the HD4870 is 1.2 teraflops.
So 1/4th the GPU power of a 4870...

Nemesis 1 · Apr 16, 2009

Let us not forget Intel paid Transmeta 230 million . (didn't Look up) . But Intel has rights to all of transmeta's IP. As i read it . So you take that into consideration along with what E2K brought with it combine it with what ever Intels working on . The X*^ Larrabe won't be like any other x86 processor ever. Intel will use bits and piecies here and there and invent what they need to fill in the gaps. NO. Larrabee may be able to run SSE code after its been ported. But This is no ordinary x86 Cpu . To say that it is with a Vertex unit on die. That takes up 2/3 the die space or was it 1/3 ? Is a little bit of a reach . Infact its a bigger reach than my speculating.

Nemesis 1 · Apr 16, 2009

Originally posted by: taltamir

Originally posted by: alyarb
if anand is right, 160 flops per clock doesn't sound to good. at 2 ghz that 10-core is only doing 320 gflop. pretty good for a CPU but a GPU? isn't the production version supposed to have 32 cores? or is it 10? are gamers supposed to buy a 320 gflop card?

Click to expand...

320 gflops = nvidia G92 or ati HD2800 series.
The GTX280 is... i think 900 gflops and the HD4870 is 1.2 teraflops.
So 1/4th the GPU power of a 4870...

Its nice having all those flops . But when using software render you have freedom. You have CHOICE. But what none mention is you have a higher degree of efficiency with those FLOPS. Thats not debateable. But what intel has stated all along is 1tera flop DP and 2tera flops SP. As guide . I look for double that amount from Intel .

Point is Software render is cheaper than hardware render and thats a fact. Your in the business . Cost of compute is no small matter. The only weakness I see in Larrabe is the texture unit. That is only short term concern tho. As even that unit will go the way of dodo third generation larrabe. But it does stop Intel from being 100% versital.

Cookie Monster · Apr 16, 2009

Originally posted by: taltamir

Originally posted by: alyarb
if anand is right, 160 flops per clock doesn't sound to good. at 2 ghz that 10-core is only doing 320 gflop. pretty good for a CPU but a GPU? isn't the production version supposed to have 32 cores? or is it 10? are gamers supposed to buy a 320 gflop card?

Click to expand...

320 gflops < nvidia G92 or ati HD2800 series.
The 9800GTX is 432 GFlops
The GTX280 is almost 1tflop and the HD4870 is 1.2 teraflops.
So 1/4th the GPU power of a 4870...

Dont forget that these FLOP figures are not very accurate when regarding real world output. These FLOP figures that present day GPUs boast can only reach those theoretical maximums under certain conditions and work loads i.e limiting the GPU severely to really flex its muscle. Even if the i7 for example might produce a minuscule number of FLOPs compared tot he modern day GPU, they can reach their theoretical maximum under many different work loads.

Thats why when Larrabee is claimed to have 320GFLOPs, it will be quite the GPGPU monster seeing as it wont be so restricted as much as what modern day GPUs are capable of in that area of GPU computing even if it only boasts 1/4 the FLOPs.

taltamir · Apr 16, 2009

flops = floating point operations per second. It is not exact conversion from that to FRAMERATE, but it IS a pretty accurate measure of compute power... now if intel makes a more efficient driver they can make less flops go further...

Nemesis 1 · Apr 16, 2009

You guys are way to hung up on the drivers . This is software render . If you want to know potencial for saving compute power in games the engine they are using will tell you more. Just go read it , Seems very very effficient . Intel will supply tools . They can't garantee the programmers is worth $$ paid. Intel will supply the programmers with powerfil effient tools from 5there its up to developers. But This Drivers thing is getting old .

So many times we are told not to compare apples and organies. Yet you guys are insisting on comparing hardware render drivers to Software render drivers. This is not apples to Apples at all . Not long to wait now lets wait and see. I will be selling crying towels for those who need them . AM I sure? YEP.

Keysplayr · Apr 16, 2009

Originally posted by: Nemesis 1
You guys are way to hung up on the drivers . This is software render .
Which makes what difference? Still needs to produce the same end result no matter how it is done.

If you want to know potencial for saving compute power in games the engine they are using will tell you more.
I need my decoder ring for this one. What does this even mean?

Just go read it , Seems very very effficient . Intel will supply tools . They can't garantee the programmers is worth $$ paid. Intel will supply the programmers with powerfil effient tools from 5there its up to developers. But This Drivers thing is getting old .

Sounds like an argument for CUDA/Stream and G-series/HD-series GPGPU's.

So many times we are told not to compare apples and organies. Yet you guys are insisting on comparing hardware render drivers to Software render drivers. This is not apples to Apples at all .

Again, how does this matter when the end result has to be the same? You're speaking as if the rendered output of hardware is somehow going to be different than software.

Not long to wait now lets wait and see. I will be selling crying towels for those who need them . AM I sure? YEP.

No, you're not sure. You haven't any idea. I hope you can get your money back for those towels because I think it's going to be a tough sell IMHO.

Nemesis 1 · Apr 16, 2009

Originally posted by: chizow

Originally posted by: SickBeast
He points out that Larabee's x86 cores are wasteful in terms of die size. This may hold true for graphics performance, but he fails to mention the benefit of having that many x86 cores in your computer. Any video encoding app would benefit without having to be patched or re-coded, for example.

For most people, it's better to have more general-processing power than it is to have a ton of graphics power. If you look at most laptop computers, it illustrates this quite clearly.

Click to expand...

Originally posted by: Extelleron
What Larrabee will do is redefine the GPGPU. I think that it will lead to widespread adoption of GPGPU thanks to the x86 architecture which will enable developers to support it without any significant change to their programs. The kind of power that is going to be available with Larrabee - likely 2TFLOPs+ peak FP, 32 cores / 128 threads..... is going to be very impressive.

Click to expand...

You guys are placing a lot of faith in Intel's Larrabee compiler to effectively make single or few threaded applications run efficiently on Larrabee's vector execution units without any additional help, especially given many of these apps don't scale particularly well on existing x86 architectures. You really don't need to look any further than a current example of scalar vs. vector design when comparing Nvidia vs. ATI stream processing units, where ATI's 5 vector design depends heavily on optimization for scaling and efficiency.

Larrabee's design will be even more dependent on application or compiler optimizations with 16 vector execution units per core. I'm also not sure where you get the impression current apps will automatically accelerate on Larrabee without being recompiled or without any application optimization, just because they share the same base x86 ISA. I think the main concern about Larrabee is that not only do you have pottentially less efficient vector units per Larrabee core with an x16 design, you have all this additional redundant x86 overhead before you can even access those execution units.

AT's Larrabee Preview by Anand and Derek

NVIDIA's SPs work on a single operation, AMD's can work on five, and Larrabee's vector unit can work on sixteen. NVIDIA has a couple hundred of these SPs in its high end GPUs, AMD has 160 and Intel is expected to have anywhere from 16 - 32 of these cores in Larrabee. If NVIDIA is on the tons-of-simple-hardware end of the spectrum, Intel is on the exact opposite end of the scale.

We've already shown that AMD's architecture requires a lot of help from the compiler to properly schedule and maximize the utilization of its execution resources within one of its 5-wide SPs, with Larrabee the importance of the compiler is tremendous. Luckily for Larrabee, some of the best (if not the best) compilers are made by Intel. If anyone could get away with this sort of an architecture, it's Intel.

At the same time, while we don't have a full understanding of the details yet, we get the idea that Larrabee's vector unit is sort of a chameleon. From the information we have, these vector units could exectue atomic 16-wide ops for a single thread of a running program and can handle register swizzling across all 16 exectution units. This implies something very AMD like and wide. But it also looks like each of the 16 vector execution units, using the mask registers can branch independently (looking very much more like NVIDIA's solution).

We've already seen how AMD and NVIDIA architectural differences show distinct advantages and disadvantages against eachother in different games. If Intel is able to adapt the way the vector unit is used to suit specific situations, they could have something huge on their hands. Again, we don't have enough detail to tell what's going to happen, but things do look very interesting.

Here's a good indication that existing code compiled for x86 isn't going to leverage Larrabee's additional vector unit functionality and additional extensions: Intel outlines CT parallel programming language at IDF. I'm sure it comes with its own highly specialized x86 compiler needed to fully extract performance out of those vector units and additional LRBni extensions.

The fact they're now pushing their own parallel computing language certainly flies in the face of some comments they made about GPGPU and CUDA becoming a footnote in the annals of history. I'd say Larrabee's existence alone would lend creedance to some of Nvidia's predictions about the future of computing, but Intel's focus on using Larrabee for highly parallel computing rather than their existing multicore desktop architectures certainly solidifies those claims.

At this point it seems the inclusion of x86 on Larrabee was more of an attempt by Intel to ensure x86 doesn't become "a footnote in the annals of computing history." Because we all know how much Intel covets and protects that x86 license don't we?

Not bad till the end .

Here how it should read. WE all know how Intel wanted to guide us away from X86 but AMD working together with MS, SStopped Intel COLD. IT HURT! EVERYONE!

Intel I have to say has done wonderful here . Not sure when we see CT. I would guess Haswell with ondie Vector unit I prefer vertex unit because of the 8 vert. Branches off each vector unit. Each one of those also has 8 branc unit . So YA intels Compilers have alot of work to do . But its more than just compilers. I can't wait till we get full core discription. You want to know what Intel has . Find out what transmeta had. Intel has full use of IP forever.Plus any new additions. Or Find out what E2K was capable of . Than Relize intels had all this since 04 E2K and transmeta I believe 06. So Until you Know these 2 techs . You can't say what X86 larrabee is like . We KNOW that the Vector (vertex)unit is I believe 2/3 the die space. Maybe its 1/3 but I recall 2/3 . As for langauges Intel flat out told MS we don't care what you do . Because we can run it. They also told MS that if They didn't put openCL on DX11. They would get eaten alive by Apple /Intel in performance. Intel is in control here make no mistakes. They all want Intel compilers . Even IBM was going after Sun till they found out the E2K tech doesn't not come with the deal.

Nemesis 1 · Apr 16, 2009

Originally posted by: Keysplayr

Originally posted by: Nemesis 1
You guys are way to hung up on the drivers . This is software render .
Which makes what difference? Still needs to produce the same end result no matter how it is done.

If you want to know potencial for saving compute power in games the engine they are using will tell you more.
I need my decoder ring for this one. What does this even mean?

Just go read it , Seems very very effficient . Intel will supply tools . They can't garantee the programmers is worth $$ paid. Intel will supply the programmers with powerfil effient tools from 5there its up to developers. But This Drivers thing is getting old .

Sounds like an argument for CUDA/Stream and G-series/HD-series GPGPU's.

So many times we are told not to compare apples and organies. Yet you guys are insisting on comparing hardware render drivers to Software render drivers. This is not apples to Apples at all .

Again, how does this matter when the end result has to be the same? You're speaking as if the rendered output of hardware is somehow going to be different than software.

Not long to wait now lets wait and see. I will be selling crying towels for those who need them . AM I sure? YEP.

No, you're not sure. You haven't any idea. I hope you can get your money back for those towels because I think it's going to be a tough sell IMHO.

Click to expand...

Your first question being whats the differance between drivers for software VS. drivers for hardware. LOL. Google be your friend.

Hay Keys I am not debating Cuda or CL or CT . All I can tell you is Intel tried to get us out of X86 hell. YOU ALL WANTED X86 Hell . Referr to Zinn2b BANNED!

Intel Is not NV . NV is Intel with Itanic . Nobody to right code. Intel spent HUGH money putting together software compiler IP HUGH . I mean Freaken Hugh.

Intel Is Not In the same position that it was in with Itanic. NV is in that position.

When you see Snow/Nehalem your going to freak. Add in NV card or ATI or Intel . I don't know but MS should have went to 64bit kernel native. There going to get ass beat.

As for Intel Muscling yes they are . You know what . Nothing can stop iy . CT is coming but not for awhile. I would say haswell . Until than its Intel playing with C code. Thats it . Its not my fault Keys. I only hoping intel pulls this off . If it doesn't I just look like I over enthused about Larrabee. If you other guys are wrong . What does that make you look like . Think about that . Only 1 ans applies.

Nemesis 1 · Apr 16, 2009

Just so we all know intels intents with CT lets make it clear . Read this . Than know the differances because there HUGH ,

http://techresearch.intel.com/...cale/Whitepaper-Ct.pdf

Keysplayr · Apr 16, 2009

Originally posted by: Nemesis 1

Your first question being whats the differance between drivers for software VS. drivers for hardware. LOL. Google be your friend.

And even friendlier is reading and comprehending what you read. I didn't ask you what the difference was between drivers for software and drivers for hardware. What I asked is, what is the difference when the END RESULT has to be the same?

Hay Keys I am not debating Cuda or CL or CT . All I can tell you is Intel tried to get us out of X86 hell. YOU ALL WANTED X86 Hell . Referr to Zinn2b BANNED!

Neither am I. I just noted the similarity when you described features of Larrabee.
And, you ARE Zinn2b.

Intel Is not NV . NV is Intel with Itanic . Nobody to right code. Intel spent HUGH money putting together software compiler IP HUGH . I mean Freaken Hugh.

Oh really... CUDA Applications

Intel Is Not In the same position that it was in with Itanic. NV is in that position.

Uh huh. I'm going to regret asking this, but please explain how Itanic relates to NV, or Larrabee.

When you see Snow/Nehalem your going to freak. Add in NV card or ATI or Intel . I don't know but MS should have went to 64bit kernel native. There going to get ass beat.

Yes, we are all going to freak over Snow Leopard on Nehalem. :roll:

As for Intel Muscling yes they are . You know what . Nothing can stop iy . CT is coming but not for awhile. I would say haswell . Until than its Intel playing with C code. Thats it . Its not my fault Keys. I only hoping intel pulls this off . If it doesn't I just look like I over enthused about Larrabee. If you other guys are wrong . What does that make you look like . Think about that . Only 1 ans applies.

Of course something can stop it. It could suck wind. Or it may not. It may be excellent in certain areas and abysmal in others, just like Pentium4. And you're doing much much more than just HOPING Intel pulls it off and speaking as if they already have.
And you have it in your head that I WANT to see it fail. Untrue to the extreme. I have massive doubts about it. The last time Intel strayed from their norm, as Ben Skywalker also commented on, in the form of Pentium Pro, i740 & Pentium4, Intel erred. Still decent chips, but fell way short. You have to realize the possibility of Larrabee continuing this legacy. Intel straying from the norm, or what they're good at.

As far as not leaving x86 because that's what everyone wanted and it "hurt us all", I don't think so. Everyone using an x86 based CPU is pretty much happily churning along with pretty much anything they want to do. It may have "hurt" if Intel actually abandoned x86 more than you can imagine. It was the industry that didn't allow for this to happen, not the end users like us. We have little say in these matters. It's all up to Microsoft, and other massive devs.

evolucion8 · Apr 16, 2009

Originally posted by: taltamir

320 gflops < nvidia G92 or ati HD2800 series.
The 9800GTX is 432 GFlops
The GTX280 is almost 1tflop and the HD4870 is 1.2 teraflops.
So 1/4th the GPU power of a 4870...

The HD 2800 never existed but the HD 2900 did, it can do 475 GFLOPS and the 8800GT can do 336 GFLOPS.

Originally posted by: Cookie Monster

Dont forget that these FLOP figures are not very accurate when regarding real world output. These FLOP figures that present day GPUs boast can only reach those theoretical maximums under certain conditions and work loads i.e limiting the GPU severely to really flex its muscle. Even if the i7 for example might produce a minuscule number of FLOPs compared tot he modern day GPU, they can reach their theoretical maximum under many different work loads.

Thats why when Larrabee is claimed to have 320GFLOPs, it will be quite the GPGPU monster seeing as it wont be so restricted as much as what modern day GPUs are capable of in that area of GPU computing even if it only boasts 1/4 the FLOPs.

That's true, usually those figures can be obtained under the best case scenario, something that doesn't happen very often, specially on ATi hardware due to it's superscalar nature.

BenSkywalker · Apr 16, 2009

Even if the i7 for example might produce a minuscule number of FLOPs compared tot he modern day GPU, they can reach their theoretical maximum under many different work loads.

Thats why when Larrabee is claimed to have 320GFLOPs, it will be quite the GPGPU monster seeing as it wont be so restricted as much as what modern day GPUs are capable of in that area of GPU computing even if it only boasts 1/4 the FLOPs.

Larry is in order and extremely reliant on SIMD to come close to its' theoretical peak. Without code being very friendly to both extreme levels of parallelism and extreme degrees of vectorization Larry will not come anywhere near its' theoretical peak. It isn't anything at all like an i7. No matter the situation it is hard to think of a situation where Larry would come out ahead- if code base doesn't lend itself to heavy parallelism but does lend itself well to heavy vectorization then Larry may be able to come close to matching the traditional GPUs, but that level will be significantly below that of its' peak theoretical throughput. If it is friendly to both optimal paths then the traditional GPUs are going to easily distance themselves. OoO code would completely cripple either design, i7 would still handily decimate them there.

I'm still trying to figure out how so many experts in this thread came to know the launch clockspeeds for Larrabee...a necessary item for making any prognostication regarding performance envelope.

Intel itself has managed to get a chip at Larry's size to 1.6GHZ, at a $4K price point. That gives us a pretty good guideline of the upper limits that we can expect, and they are a fraction of what it needs to be.

Nemesis 1 · Apr 16, 2009

Originally posted by: Keysplayr

Originally posted by: Nemesis 1
You guys are way to hung up on the drivers . This is software render .
Which makes what difference? Still needs to produce the same end result no matter how it is done.

If you want to know potencial for saving compute power in games the engine they are using will tell you more.
I need my decoder ring for this one. What does this even mean?

Just go read it , Seems very very effficient . Intel will supply tools . They can't garantee the programmers is worth $$ paid. Intel will supply the programmers with powerfil effient tools from 5there its up to developers. But This Drivers thing is getting old .

Sounds like an argument for CUDA/Stream and G-series/HD-series GPGPU's.

So many times we are told not to compare apples and organies. Yet you guys are insisting on comparing hardware render drivers to Software render drivers. This is not apples to Apples at all .

Again, how does this matter when the end result has to be the same? You're speaking as if the rendered output of hardware is somehow going to be different than software.

Not long to wait now lets wait and see. I will be selling crying towels for those who need them . AM I sure? YEP.

No, you're not sure. You haven't any idea. I hope you can get your money back for those towels because I think it's going to be a tough sell IMHO.

Click to expand...

I need my decoder ring for this one. What does this even mean?

YA , We all slip sorry, From CT link .

It often is convenient for the programmer to think of
the computing resources provided by a multi-core CPU
as an engine for data-parallel computation. The basic
idea is that applications exhibit a lot of parallelism
through operations over collections of data.
Abstracting the underlying hardware threads, cores,
and vector ISA as computation over collections of data
greatly simplifies the task of expressing parallelism in
an architecture-independent fashion.

Nemesis 1 · Apr 16, 2009

From Keys.

Just go read it , Seems very very effficient . Intel will supply tools . They can't garantee the programmers is worth $$ paid. Intel will supply the programmers with powerfil effient tools from 5there its up to developers. But This Drivers thing is getting old .

Sounds like an argument for CUDA/Stream and G-series/HD-series GPGPU's.

Dammit Keys you have always done this board with pride. Stay with it.

Your statement above . Is all wrong. I was kind enough to get Intel CT paper so All could read what it brings in ways of easy programming .

Rather than just saying sounds like this or that. Get the white paper on cuda and lets do a compare as to who has the more open model. Who has the Best programmi ng Model .

Talk is Talk . Lets see the White papers. Than we as a community can examine and make senseable evalutions based on real info . Not pie in the SKY BS. We can compare side by side what each offers. Now thats debating. On the Facts!

Idontcare · Apr 16, 2009

Originally posted by: BenSkywalker

I'm still trying to figure out how so many experts in this thread came to know the launch clockspeeds for Larrabee...a necessary item for making any prognostication regarding performance envelope.

Click to expand...

Intel itself has managed to get a chip at Larry's size to 1.6GHZ, at a $4K price point. That gives us a pretty good guideline of the upper limits that we can expect, and they are a fraction of what it needs to be.

What do clock rates and pipelines in a 65nm chip containing EPIC ISA for the IPF architecture have to do with those of a 45nm chip containing LRBni ISA and x86 architecture?

I can't tell if you really don't know much about layout and design as to make such a meaningless comparison or if you think the rest of don't and you are just being lazy for the sake of your own convenience in keeping the rebuttal short. Help me understand.

BenSkywalker · Apr 16, 2009

I can't tell if you really don't know much about layout and design as to make such a meaningless comparison or if you think the rest of don't and you are just being lazy for the sake of your own convenience in keeping the rebuttal short. Help me understand.

OK, I'll help you understand then. Using the most simplistic form of filtering you need to be pushing 9 instructions per pixel(that is assuming a fully optimized TMU unit and a very basic filter), to compete with even a GTX260 192 Larry is going to need to push 332,100MIPS. If we figure for 20 cores, and are generous on the instructions it can retire per clock give it 44MIPS/MHZ(Cell is 3.2MIPS/MHZ by way of comparison- I'm giving Larry more then an order of magnitude benefit of the doubt- Perfect scaling from PPro) then it would need to be clocked at 7.54GHZ. I'm being exceptionally generous on all counts to Larry in this assesment. Comparing it to an outdated part that is being phased out, giving it an order of magnitude more performance then the closest current architecture and using the simplest possible filtering.

In a more realistic sense, it would need to be closer to 30GHZ to hit parts in its timeline at 20 cores, and Intel seems to be leaning towards 16 cores atm. No, I don't know exactly what kind of clock rates Intel is going to hit, but over 600mm and 30GHZ? I'd be willing to wager fairly heavily that that is a pipe dream even the most fringe lunatic of Intel fans wouldn't want to wager on.

And for the record, that is just to handle basic fillrate- the overwhelming majority of the GPUs will be sitting idle.

Edit- Figured I should probably point out that the current top tier GPU solution would require a bit over 20GHZ- I realized upon review it made it look like a stretch going from 7.5GHZ to 30GHZ. Also, 1.6GHZ does give us a guideline in the sense that the odds of anyone getting 600mm die to ~20-30GHZ is about as close to 0 as you can get using anything resembling current technology. Also, if we removed the TMU these numbers would go up by an order of magnitude just in computation time, additional stalls from read/writes would have that closer to two orders of magnitude.

Nemesis 1 · Apr 16, 2009

So I have heard 8 cpu larrabe. 16 cpu larrabe and 32 core larrabe and 40 core larrabe .

What is it? Lets use 8 cores because it fits what I want it to . Or 16. but god forbidd were talking 24,32.40, or even 48 cores. So I get to choose. Is that not my choice?

In a speculation thread were undermining intel tech is the goal rather than useful discusion.

So I choose Larrabe as being 48 cores . Why because thats what I want it to be .

So thats what 1.56 tera flops DP or 3.12 tera flops sp. In software render more than enough. Its surprising the efficiency of a vortec unit 512 bit. Since the x86 part of core is off during gaming I see little overhead . Other than decoders.

Hows it work with that Vortec unit ' 16 issues in 1 cycle. each issue has a branch of 8 vertice and each vertice has a branch of 8 more. so that 32x16= 522 ops in flight percore every cycle. Thats impressive. The way Intel alignigned things is even more impressive. So On your 16 core setup Thats 8,352 operations per cycle.

On My 48 core set up its 25,056 ops in flight per cycle. My system way stronger larrabe than yours 16 core tho LOL!

taltamir · Apr 16, 2009

nemesis, processors should NEVER be thought of as engines, the analogy is just terrible and never works right.

Nemesis 1 · Apr 17, 2009

Originally posted by: taltamir
nemesis, processors should NEVER be thought of as engines, the analogy is just terrible and never works right.

Its not something I will argue about. New thinking is often rejected. But It seems more than myself refer to 2 or more cores as engines . As shown in link. Boy I really screwed that last post over. Best go reread and get the wording correct . LOL

taltamir · Apr 17, 2009

this is NOT new thinking, this is old thinking, everyone and their grandmother has used the engine analogy.. notice how video cards keep on having cars on them and showing of the "redlining" of the engine?
it is the most common analogy for processors and it is absolute worst analogy for them because they are NOTHING alike.

Idontcare · Apr 17, 2009

Originally posted by: BenSkywalker

I can't tell if you really don't know much about layout and design as to make such a meaningless comparison or if you think the rest of don't and you are just being lazy for the sake of your own convenience in keeping the rebuttal short. Help me understand.

Click to expand...

OK, I'll help you understand then. Using the most simplistic form of filtering you need to be pushing 9 instructions per pixel(that is assuming a fully optimized TMU unit and a very basic filter), to compete with even a GTX260 192 Larry is going to need to push 332,100MIPS. If we figure for 20 cores, and are generous on the instructions it can retire per clock give it 44MIPS/MHZ(Cell is 3.2MIPS/MHZ by way of comparison- I'm giving Larry more then an order of magnitude benefit of the doubt- Perfect scaling from PPro) then it would need to be clocked at 7.54GHZ. I'm being exceptionally generous on all counts to Larry in this assesment. Comparing it to an outdated part that is being phased out, giving it an order of magnitude more performance then the closest current architecture and using the simplest possible filtering.

In a more realistic sense, it would need to be closer to 30GHZ to hit parts in its timeline at 20 cores, and Intel seems to be leaning towards 16 cores atm. No, I don't know exactly what kind of clock rates Intel is going to hit, but over 600mm and 30GHZ? I'd be willing to wager fairly heavily that that is a pipe dream even the most fringe lunatic of Intel fans wouldn't want to wager on.

And for the record, that is just to handle basic fillrate- the overwhelming majority of the GPUs will be sitting idle.

Edit- Figured I should probably point out that the current top tier GPU solution would require a bit over 20GHZ- I realized upon review it made it look like a stretch going from 7.5GHZ to 30GHZ. Also, 1.6GHZ does give us a guideline in the sense that the odds of anyone getting 600mm die to ~20-30GHZ is about as close to 0 as you can get using anything resembling current technology. Also, if we removed the TMU these numbers would go up by an order of magnitude just in computation time, additional stalls from read/writes would have that closer to two orders of magnitude.

Now there is an acceptable level of justified opinion, thanks for taking the time to go into it!

It does naturally beg a degree of self-assessment though - unless we presume ourselves to be superiorly intelligent to the decision makers at Intel, we must assume Intel knew ALL of this before they even assembled the layout team for Larrabee nearly 3 yrs ago. And they certainly knew it 2 yrs ago, and 1 year ago, and today.

So why then would Intel choose to ignore this information and develop a product with such woefully obvious inadequacies? Why would they release such a woefully inadequate product next year?

Just saying we've got to be giving ourselves a lot of credit in the grey matter department and assuming Intel's decision makers are operating with a commensurate depletion of it in order to be so confident as to assume we know what they don't and that we foresee in their competition something which they do not.

Are we so supremely confident in ourselves as to make such an assertion?

nVidia scientist on Larrabee

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Diamond Member

Lifer

Lifer

Elite Member

Lifer

Lifer

Lifer

Elite Member

Platinum Member

Diamond Member

Lifer

Lifer

Elite Member

Diamond Member

Lifer

Lifer

Lifer

Lifer

Elite Member