I7 Memory Controller + Larabee

Spectator · Nov 6, 2008

So was thinking about how fast/efficient the CPU to CPU links are on the I7.

So the QPI can do 12gig sec each way between CPU's. Each CPU can have its own memory. Its logical that Intel could treat the Larabee as a second CPU and hook it up to the QPI?

That would give them a built in advantage when they sold enough I7's to enter the Graphics market propperly yes?

you more technical peoples could probably elaberate more then I. I just thought i would throw in the idea

Spectator

ihyagp · Nov 6, 2008

Maybe. QPI to the CPU cores will provide a lot better interconnect bandwidth and latency than PCI express. However, even triple-channel DDR3 provides a lot less memory bandwidth than the GDDR3/GDDR5 on today's video cards, and it has to be shared with the CPU.

aigomorla · Nov 6, 2008

no...

because both chips need 2 x qpi enabled for it to work

A typical I7 only has 1 qpi.

Gainestown has 2 x qpi.

So Larabee will not be anywhere near gainestown class, so your hopes to connect 2 and get sli variant with quadcores is a very far fetched dream

waffleironhead · Nov 6, 2008

Originally posted by: aigomorla
no...

because both chips need 2 x qpi enabled for it to work

A typical I7 only has 1 qpi.

Gainestown has 2 x qpi.

So Larabee will not be anywhere near gainestown class, so your hopes to connect 2 and get sli variant with quadcores is a very far fetched dream

I'm not sure he is talking sli'd larabee, but rather the quicker connect between the gpu/cpu being that they are on the same package. The issue I see is that the gpu still need to reach out to the system memory which will slow it down(unless they incorporate gpu memory onto the die itself..unlikely).

OP can you clarify your question?

Idontcare · Nov 6, 2008

I see no reason why not...but technically accomplishing this is not dependent on i7 nor QPI. It could be done with any existing hardware configuration.

The rate-limiting process for deploying the technology is software, not hardware.

Generally speaking there are two "things" that limit scaling (performance) of multi-threaded applications.

One is the percentage of overall computations that can be performed in parallel. (referred to as "percentage of serial code" as in "this is a highly serial code" or "this code is trivially highly parallelized")

The second has to do with the inter-relatedness of the threads, i.e. how frequently does one thread need to check on the results of another parallel thread to determine the impact on its next processing step. (referred to as "grained" as in "this code is fine-grained" or "this code is coarse-grained")

So the way it works is the relative percentage of serial code determines the limits to which you can ever expect your performance to increase as you add more and more threads (cores, processing power, etc) to your system. See Ahmdahl's law.

Even with infinitely fast inter-processor communications the speedup won't be 1:1 because the code itself can never be 100% parallelized, there will always be some non-zero percentage of serial code present if for no other reason than the process of thread spawning and re-assembling the results from the threads is intrinsically a serial computation.

In addition to the limits imposed by the percentage of serial code present in the application you still have the question of how fast can information be communicated between the threads? (between the cores) This is where QPI and HT come in, they are superior to their predecessors but this extra speed is really only needed for those so-called fine-grained applications.

A course-grained application (pov-ray, cinebench, etc) won't be rate limited by interprocessor communications so an improvement in this area won't be reflected in a performance improvement at the application level.

With that in mind - would Larrabee help improve the performance of applications? Yes...but you really are going to be dependent on the specific application having been coded to be ridiculously parallel (>95% parallel code).

Will QPI make a difference? depends on how fine-grained the specific computations turn out to be and that is very application specific. The extra bandwidth won't hurt, that's for sure.

Spectator · Nov 7, 2008

I understand the nvid 280 has about 140gig memory bandwidth.

But was thinking that larabee would probably have a much larger path to its own local graphics memory, to measure up to nvid/ati

It just seemed logical as they say larabee is going to be just a bunch 8086 cpu's. it would make sence to engineer it to connect as a second cpu on the QPI. That is faster than using the PCI-express bus architecture?

Spectator

IntelUser2000 · Nov 7, 2008

So was thinking about how fast/efficient the CPU to CPU links are on the I7.

So the QPI can do 12gig sec each way between CPU's. Each CPU can have its own memory. Its logical that Intel could treat the Larabee as a second CPU and hook it up to the QPI?

That would give them a built in advantage when they sold enough I7's to enter the Graphics market propperly yes?

you more technical peoples could probably elaberate more then I. I just thought i would throw in the idea

They aren't using Larrabbee cores for the Core i7 IGP so the point is moot in the first place. Sure, the IGP version of the i7 does use QPI to connect the IGP+MCH to the CPU core.

Nemesis 1 · Nov 7, 2008

Originally posted by: Idontcare
I see no reason why not...but technically accomplishing this is not dependent on i7 nor QPI. It could be done with any existing hardware configuration.

The rate-limiting process for deploying the technology is software, not hardware.

Generally speaking there are two "things" that limit scaling (performance) of multi-threaded applications.

One is the percentage of overall computations that can be performed in parallel. (referred to as "percentage of serial code" as in "this is a highly serial code" or "this code is trivially highly parallelized")

The second has to do with the inter-relatedness of the threads, i.e. how frequently does one thread need to check on the results of another parallel thread to determine the impact on its next processing step. (referred to as "grained" as in "this code is fine-grained" or "this code is coarse-grained")

So the way it works is the relative percentage of serial code determines the limits to which you can ever expect your performance to increase as you add more and more threads (cores, processing power, etc) to your system. See Ahmdahl's law.

Even with infinitely fast inter-processor communications the speedup won't be 1:1 because the code itself can never be 100% parallelized, there will always be some non-zero percentage of serial code present if for no other reason than the process of thread spawning and re-assembling the results from the threads is intrinsically a serial computation.

In addition to the limits imposed by the percentage of serial code present in the application you still have the question of how fast can information be communicated between the threads? (between the cores) This is where QPI and HT come in, they are superior to their predecessors but this extra speed is really only needed for those so-called fine-grained applications.

A course-grained application (pov-ray, cinebench, etc) won't be rate limited by interprocessor communications so an improvement in this area won't be reflected in a performance improvement at the application level.

With that in mind - would Larrabee help improve the performance of applications? Yes...but you really are going to be dependent on the specific application having been coded to be ridiculously parallel (>95% parallel code).

Will QPI make a difference? depends on how fine-grained the specific computations turn out to be and that is very application specific. The extra bandwidth won't hurt, that's for sure.

Really nice job . Read this post and info and try and do as good a job . with whats written here. Your better writer than I by far.

http://forums.anandtech.com/me...=2245171&enterthread=y

Idontcare · Nov 7, 2008

Originally posted by: Nemesis 1
Really nice job . Read this post and info and try and do as good a job . with whats written here. Your better writer than I by far.

Well I spent nearly a decade working in this specific area (multi-thread scaling) on both hardware and software so contributing some knowledge to threads such as these is the trivial part IMO, but figuring how to express the answer in terms and verbage clearly enough for forum-go'ers to be able to gain some understanding is the challenge. Thanks for the kind words, I'll take a look at your referenced link.

Spectator · Nov 20, 2008

Well as a noob.

After reading the Nahalem pt 3.. that answers some of my questions. and yet pose's more.

To summarise I am intereasted in the overall goal of intel before they release Larrabee, and the possible hardware changes/upgrades involved.

Lacking tech knowledge, but obviously not lacking forward thinking and logical extrapolation. yes im messing somewhat. as I did not deliberately be specific. Except for the title of the post; In an effort to have more tech savvy ppl's fill in the tech details. But as it seems. Intel have done that for us. less than 1 week later.

So have Intel learnt how short sighted AMD was in just focusing on cpu/mem interconnect speed/bandwidth for instant gratification. And intel have taken advantage of market share/income to take it to next level.

yes this speculates intel could have done more sooner.but decided to play longer game. And are considering thier entire product range. I dont have answers just obvious questions and speculation.
ok so you have to handle 1/? core atom, 4/8 core desktop, 8+ core server : cpu's. You have been slacking on for a few years, just doing what is needed so you are safe and not breaking status quo. While quietly plotting big changes.
but you know the market; and would prefer to subtley smooth the changes in maximising your profit along the way.

Sadly you need to give MS time to remove xx cpu limitation as your predominent OS supplier. Also you need to focus developers into SMP logic. (Ideally away from complex/expensive/slow things you cant implement in small hardware(transistor count)). But that is another story.

What as a business would you do now?

And the Obligatory random questions follow.
1) for us old folk messing with OC AMD cpu's. we re remember the golden bridges. Now we see I7 with ass load of connections on top of cpu. This surely is a method for programming individual silicon for required settings?. that would equal less old/redundant stock; as you can re-programm them to suit. Obviously Intel has a quality production method. ( I cant think of another company that can "mass" produce quality to match Intel ).
logical conclusion : you over clockers, you made them see we will abuse their hardware so they impose new rules. ( Still at least it wont live long enough to help 3rd world countries short of hardware. It/rest of PC will be in some China village as landfill.

)

2) Intel are changing the way they link cpu/cpu which is all good/fast compared to competition. now come the questions.
2a) if the larrabee is basicly a xxx/cpu why not link it to QPI on 1 side and have its own conrtoller/vram on the other.
2b) the new pci-e on chip controller is curious. Is that just backward compatibility; BUT also giving an advantage to Larrabee hardware(if intel are confident that could even mean a new socket?).

3) If Intel are being efficient and considering the users. the February CPU could be same socket as current I7(1366) but with less physical pins.(ie. use a selector/power pin to denote 1156/1366).

I am Serious. But am just not a serious person. personal understanding is more usefull than..A fish.. (Its only because i have doubts that.. teach a "person" to fish.)

So many questions. but. its time to go chill in real/now life. I hope MY curiosity meets with some logical responses.

JAG87 · Nov 20, 2008

There's no need really. Pci-e 2.0 gives 8 GB/s transfer, and pci-e 3.0 will give almost 13 GB/s of bandwith. Plus a GPU will always needs it's own buffer that's by far faster, so an add-in card is inevitable.

Nemesis 1 · Nov 20, 2008

For today the QPI link isn't needed . But when Nehalem/ Larrabee arrives it will be needed. Heres part of Intels gaming API. There is more on the subject. This is just to head ya in the right direction.

Neoptica's Vision

Neoptica published a technical whitepaper back in March 2007. Matt Pharr also gave a presentation at Graphics Hardware 2006 that highlights some similar points.
It explained their perspective on the limitations of current programmable shading and their vision of the future, which they name "programmable graphics". Much of their point resides on the value of 'irregular algorithms' and the GPU's inability to construct complex data structures on its own.
They argue that a faster link to the CPU is thus a key requirement, with efficient parallelism and collaboration between the two. Only the PS3 allows this today.
They further claim the capability to deliver many round-trips between the CPU and the GPU every frame could make new algorithms possible and improve efficiency. They plead for the demise of the unidirectional rendering pipeline.

Search

I7 Memory Controller + Larabee

Spectator

Junior Member

ihyagp

Member

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

waffleironhead

Diamond Member

Idontcare

Elite Member

Spectator

Junior Member

IntelUser2000

Elite Member

Nemesis 1

Lifer

Idontcare

Elite Member

Spectator

Junior Member

JAG87

Diamond Member

Nemesis 1

Lifer

TRENDING THREADS