Holy Lord... Intels Larrabee ---disclaimer: INQ LINQ

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

SamurAchzar

Platinum Member
Feb 15, 2006
2,422
3
76
Originally posted by: kobymu
Originally posted by: BFG10K
That's my point - for a GPU to work at today's performance levels it needs fast dedicated RAM and lots of it.

If you want to move the GPU onto the CPU's die you need to deal with this fact.

The 'dedicated' part is misleading/redundant, and i'm pretty sure, although not absolutely certain, that moving the GPU onto the CPU's die will require additional bandwidth only, and duo to the parallel nature of modern GPUs, latency can even take a slight to moderate hit, GPUs need a feeding stream of high bandwidth memory. And it is also worth mentioning that modern CPU (Core2Duo and especially Athlon64), in the desktop environment, have bandwidth to spare in most applications (although not anywhere near the requirements of modern GPUs). But I generally agree with your point.

What i'm mostly curious about is how does the CPU cache fit into this, because allowing the level 2 cache contain 3D data can potentially have a awful effect on its effectiveness (really bad!), I don?t think that the 3D data from even a relatively small scene in a modern game can fit into 2/4MB, this is a second issue "you need to deal with". What i'm curious about is how, exactly, will CPU/GPU designers overcome this issue.

And we haven?t even touched the integration issue...

IMHO, GPU work currently doesn't require caching, as the data is streamlined and predictable. If you employ good pipelining, you can maintain 0 latency during normal work. You always know what to get in advance.
The problem starts when you don't know what your next data will be - then you have to fetch it on demand and encounter latency.
 

kobymu

Senior member
Mar 21, 2005
576
0
0
Originally posted by: SamurAchzar
IMHO, GPU work currently doesn't require caching, as the data is streamlined and predictable. If you employ good pipelining, you can maintain 0 latency during normal work. You always know what to get in advance.
The problem starts when you don't know what your next data will be - then you have to fetch it on demand and encounter latency.
"...and we haven?t even touched the integration issue..." ;)



 

BFG10K

Lifer
Aug 14, 2000
22,709
3,003
126
The 'dedicated' part is misleading/redundant
I think not - it's important because it shows the difference between shared RAM and dedicated RAM.

A system with X bandwidth sharing a video card will always be slower than a system with X bandwidth that also has a video card with its own X bandwidth.

Maybe it's not how fast the ram is, but how you use it.
I really cannot see why you insist on burying your head in the sand.

Just side step for a moment. Look at the netburst architecture. Needed insane clock speeds to compete. Eventually getting up to 3.8 GHz (which actually is what I have now).
Enter Conroe. It is @ half the clock speed (for lack of the exact ratio) and still blows away netburst. Different architecture.
So you if paired a Conroe with 66 MHz SDRAM do you think it would work well? Because that's exactly what you're implying when you claim a GPU will be fast with shared system RAM.

Maybe it won't need 3000MHz GDDR5/6.
Then it'll just be a glorified GMA.

The 9800 Pro has a good deal more memory bandwidth thanks to the 256 bit memory bus versus the 6600's 128.
Yet they both have more dedicated bandwidth than a system today has shared bandwidth.
 

Keysplayr

Elite Member
Jan 16, 2003
21,219
54
91
Originally posted by: BFG10K
The 'dedicated' part is misleading/redundant
I think not - it's important because it shows the difference between shared RAM and dedicated RAM.

A system with X bandwidth sharing a video card will always be slower than a system with X bandwidth that also has a video card with its own X bandwidth.

Maybe it's not how fast the ram is, but how you use it.
I really cannot see why you insist on burying your head in the sand.

I'm the one with his head buried in the sand???? I think it's the other way around brudda. I have an open mind about this, but apparently you can't fathom anything done differently than it is today. WTF?

Just side step for a moment. Look at the netburst architecture. Needed insane clock speeds to compete. Eventually getting up to 3.8 GHz (which actually is what I have now).
Enter Conroe. It is @ half the clock speed (for lack of the exact ratio) and still blows away netburst. Different architecture.
So you if paired a Conroe with 66 MHz SDRAM do you think it would work well? Because that's exactly what you're implying when you claim a GPU will be fast with shared system RAM.

You're very thick. Talking future here and being open to possibilities.

Maybe it won't need 3000MHz GDDR5/6.
Then it'll just be a glorified GMA.

Everything you say has to do with TODAY's tech. No insight or imagination available to you? There is a thing called keeping it real, and then there is something called being stubborn as a mule. Which one are you?

The 9800 Pro has a good deal more memory bandwidth thanks to the 256 bit memory bus versus the 6600's 128.
Yet they both have more dedicated bandwidth than a system today has shared bandwidth.

Yes!!!! Today!!!! Sheesh. What about tomorrow? (And if you take me literally meaning tomorrow, February 12th 2007, I will personally send an electronic slap your way.) ;)

And let me ask you, whats to stop anybody from utilizing 2000+ MHz memory on a system board eventually? Or faster? Hell, Samsung just announced 1400MHz GDDR4. Won't be anymore expensive than if they just used the memory on a discrete vid card would it?
 

Genx87

Lifer
Apr 8, 2002
41,091
513
126
These are in-order mini x86 cores? How does this tie in with DirectX? Considering the fact we have x86 cores with 9 execution units barely spitting out 1 instruction per clock. I am not terribly convinced Intel will be able to get this thing to the point of beating ATI or Nvidia.
 

Keysplayr

Elite Member
Jan 16, 2003
21,219
54
91
Ever since it was shown how an R580 core can be used for F@H on crack, I thought.. why couldn't ATI or Nvidia just whip out General purpose CPU's that exceeded anything AMD/Intel could put out today. Or look at it the other way around. Why couldn't AMD or Intel make CPU's that were more like an R580 or a G80? Not identical, but MORE like.?
 

SickBeast

Lifer
Jul 21, 2000
14,377
19
81
Originally posted by: keysplayr2003
Ever since it was shown how an R580 core can be used for F@H on crack, I thought.. why couldn't ATI or Nvidia just whip out General purpose CPU's that exceeded anything AMD/Intel could put out today. Or look at it the other way around. Why couldn't AMD or Intel make CPU's that were more like an R580 or a G80? Not identical, but MORE like.?
I don't think there's much point for most applications.

F@H just happens to be a task that the GPU is good at. If you needed to calculate some advanced mathematics, a CPU would probably be faster.
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
Originally posted by: keysplayr2003
Ever since it was shown how an R580 core can be used for F@H on crack, I thought.. why couldn't ATI or Nvidia just whip out General purpose CPU's that exceeded anything AMD/Intel could put out today. Or look at it the other way around. Why couldn't AMD or Intel make CPU's that were more like an R580 or a G80? Not identical, but MORE like.?
Because single threaded performance would suck.

 

allies

Platinum Member
Jun 18, 2002
2,572
0
71
Originally posted by: SickBeast
Originally posted by: keysplayr2003
Ever since it was shown how an R580 core can be used for F@H on crack, I thought.. why couldn't ATI or Nvidia just whip out General purpose CPU's that exceeded anything AMD/Intel could put out today. Or look at it the other way around. Why couldn't AMD or Intel make CPU's that were more like an R580 or a G80? Not identical, but MORE like.?
I don't think there's much point for most applications.

F@H just happens to be a task that the GPU is good at. If you needed to calculate some advanced mathematics, a CPU would probably be faster.

QFT. You don't see people setting SuperPi records with GPUs... rather CPUs.
 

zephyrprime

Diamond Member
Feb 18, 2001
7,512
2
81
Regarding the bandwidth issues of combined cpu/gpus: this is a big problem. I could see the combined units coming in multichip packages like the xbox 360 gpu. Doing that could keep a lot of bandwidth off the system bus. Lower end units will probably rely on motherboard bandwidth only. Since current integrated systems rely on that already, I don't think it's a big stretch to see this happening. Of course, performance will suck. Motherboard bandwidth will probably have to be improved. They will probably have to start making smaller dimms and using more of them so that chips can have 256bit buses. Chip packages will be correspondingly big. Maybe the cpu will start to be put on a slot again like it was with the early pentium 2's except the dimms were placed on it also. This would keep the motherboard cheaper at the expense of more exensive cpu's.
 

Keysplayr

Elite Member
Jan 16, 2003
21,219
54
91
Originally posted by: SickBeast
Originally posted by: keysplayr2003
Ever since it was shown how an R580 core can be used for F@H on crack, I thought.. why couldn't ATI or Nvidia just whip out General purpose CPU's that exceeded anything AMD/Intel could put out today. Or look at it the other way around. Why couldn't AMD or Intel make CPU's that were more like an R580 or a G80? Not identical, but MORE like.?
I don't think there's much point for most applications.

F@H just happens to be a task that the GPU is good at. If you needed to calculate some advanced mathematics, a CPU would probably be faster.

Well regardless, this seems to be the way things are headed. F@H is a task R580 is good at, but what else (besides gaming of course) is it good at? I mean, apparently someone was interested enough in this to actually find out that F@H rocked on R580. So, the exploration will surely continue to other areas where the chip excels, or not. Most likely, all the downfalls and disadvantages everyone is mentioning in this thread, are being asked at company meetings with all engineers present at Intel, AMD, Nvidia.

One person will say, "Performance will suck compared to discrete. We just don't have the bandwidth cap'n!!" Then another will say, "But what if........." And thats all it takes. A spark of enginuity, even a tiny one, can be grown into something huge. Fortunately there would be many sparks, just in case the first dozen or so don't pan out. That is what R&D is.

I'm not trying to be overly optomistic here. But I'm not trying to be a naysayer either by any extent. I do believe that almost anything is possible, whether probable or not.

Tired. Nite folks.

Keys
 

DeathReborn

Platinum Member
Oct 11, 2005
2,786
789
136
Originally posted by: keysplayr2003
Originally posted by: BFG10K
The 'dedicated' part is misleading/redundant
I think not - it's important because it shows the difference between shared RAM and dedicated RAM.

A system with X bandwidth sharing a video card will always be slower than a system with X bandwidth that also has a video card with its own X bandwidth.

Maybe it's not how fast the ram is, but how you use it.
I really cannot see why you insist on burying your head in the sand.

I'm the one with his head buried in the sand???? I think it's the other way around brudda. I have an open mind about this, but apparently you can't fathom anything done differently than it is today. WTF?

Just side step for a moment. Look at the netburst architecture. Needed insane clock speeds to compete. Eventually getting up to 3.8 GHz (which actually is what I have now).
Enter Conroe. It is @ half the clock speed (for lack of the exact ratio) and still blows away netburst. Different architecture.
So you if paired a Conroe with 66 MHz SDRAM do you think it would work well? Because that's exactly what you're implying when you claim a GPU will be fast with shared system RAM.

You're very thick. Talking future here and being open to possibilities.

Maybe it won't need 3000MHz GDDR5/6.
Then it'll just be a glorified GMA.

Everything you say has to do with TODAY's tech. No insight or imagination available to you? There is a thing called keeping it real, and then there is something called being stubborn as a mule. Which one are you?

The 9800 Pro has a good deal more memory bandwidth thanks to the 256 bit memory bus versus the 6600's 128.
Yet they both have more dedicated bandwidth than a system today has shared bandwidth.

Yes!!!! Today!!!! Sheesh. What about tomorrow? (And if you take me literally meaning tomorrow, February 12th 2007, I will personally send an electronic slap your way.) ;)

And let me ask you, whats to stop anybody from utilizing 2000+ MHz memory on a system board eventually? Or faster? Hell, Samsung just announced 1400MHz GDDR4. Won't be anymore expensive than if they just used the memory on a discrete vid card would it?

I would like to see a future CPU allow for 4 banks of RAM in Quad Channel (256bit) as that might help with bandwidth but adding cost.

Alternatively they could use a seperate 256bit+ memory controller for the GPU part of the chip to seperate slots of maybe XDR/GDDR4+. I just don't see it being possible to use the system RAM for both in the coming 5 or so years. It all adds some serious cost to the project that most will not be willing to pay.

I can see low/mid range maybe being like Larrabee (assuming they can work with DX/OGL API's) but High End would be too expensive for the project.