how parallel was larrabee?

Anarchist420

Diamond Member
Feb 13, 2010
8,645
0
76
www.facebook.com
i was thinking it wasn't very parallel since it had low performance and since it had 64 cores but i dont know for sure so that was why i was asking. i was guessing the GK110's CUDA architecture is actually more parallel and has some useful features (particularly bandwidth saving and efficiency techniques like being able to read from the hard drive) vs. larrabee although i am sure the former isn't radically parallel either.

asking because everyone here knows i am a fan of programmable color/depth (or programmable whatever) and that it could be as fast anyone wanted it too or look as good as anyone wanted it to rather than being a compromise that no one is as satisfied with as they would be with other things (hardware is only as scalable as the hardware allows). the only things that i wouldnt discard immediately are all of the features in current display logic and maybe something that would make multiple dies play without any known issues (nvidia has hardware that reduces microstutter with SLI for example). of course, software solutions dont have to replace specialized hardware, but we likely only have two discrete GPU makers because of IP and so there isnt much need for them to take any risks although one of them does some new things sometimes but then charges above market prices for them, keeps their drivers mostly closed source, and then they arent really happy with the money they make anyway... this stuff could be made out of a few garages and it would be a lot cheaper and better.

anyway, i remember how sufficient doing most things in software was back in the day (256 color VGA graphics was hardware but that just rendered the pixels and did a few other minor things and the sound hardware did a lot too, but systems back then didnt use multiple CPUs), so i dont see how adding another general purpose die or two with an ISA optimized for graphics wouldnt be just as good especially if there were no IP. i am just not convinced that hardware is able to be objectively measured as better than software... i dont even know that we'd have had 3d accelerators if intel had had competition or if had kept up the FPU performance and instruction sets appropriate for each application... simply offered chipsets with multiple general purpose die sockets (slots at the time the first 3d accelerators were released if i am not wrong).
 

bystander36

Diamond Member
Apr 1, 2013
5,154
132
106
Why would fewer cores give more parallelism? I thought parallel programming was special in the sense it can utilize lots of threads at once. More cores adds up to more threads.

Efficiency is likely Larrabees biggest fault.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
wouldn't fewer cores have been more parallel?

thank you for your kind reply:)

The more cores you have the more parallel you get (In theory, there are other factors of course). As you get fewer and fewer cores you become less parallel until you eventually get to one core and then are basically serial (looking past any context switching).
 

Cerb

Elite Member
Aug 26, 2000
17,484
33
86
I did not know this.
I think I might have gotten some unwanted attention with the guffaw caused by reading that. :awe: We all know he's a fan of that, and now we have DC, GLSLang, and OpenCL getting mainstream use. That future is here, however unevenly distributed.

why doesn't Intel make a dGPU?
They did. It sucked. They pulled it. The i740 was a good implementation of a very bad idea, at the time it was made. Today, there's no need. dGPUs are not a growth market, and there are already two major players. Improving IGP is where it's at.

--

Larrabee was plenty parallel enough, though much less so than Fermi or similar-era Radeons.

Up until Fermi, Geforces could only run a few different programs at once (G80 could only run one at a time across the whole GPU), though they could have many different data paths going for the same program. This is part of why Fermi cards performed so well even being big and hot: they could run some number of threads per SM, with several SMs, and each SM having many processors, leading to high real-world utilization. While I don't recall the unique processes breakdown, each CUDA core Fermi and later can work with multiple program counters, and actual unique process count got high enough to not worry much about, anymore (in 2010, I could rap off the numbers, but it's been awhile! :)).

Radeons were given processor counts by how many total data paths they had, so their hundreds of them at the time wasn't, though I don't recall the specific breakdown, much like for G2xx v. GF1xx. Total BS, though, especially as more complex game shaders weren't getting high ILP within each SIMD/VLIW block (any kind of processor definition should work out with code that has ILP <= 1, not requiring it to be 4 or 5 to be true).

Larrabee's cores and threads, however, were just what they looked like, and Intel didn't go advertising high numbers based on SIMD width or anything, either. But, it has to be cache-coherent, being x86, and full-hardware synchronized CC is expensive, one way or another, in large part because it's a feature that can't exist in a vacuum (they could make it slow, or they could make it hot). nVidia got around that by making each SM capable of CC within itself, but leaving system-wide CC to software, and AMD, with GCN, is going with a parallel in-order commit scheme (they call it OoO for marketing, but it's really some novel tricks to allow parallel predictable in-order commits to be used w/o much hardware CC overhead). But, neither nV nor AMD had x86 baggage to deal with, except in making sure their MMUs would be x86-compatible.