Intel's new branding mess...

Idontcare · Jun 23, 2009

Thanks Denithor, a near 50% reduction in fps stemming from a 50% reduction in bandwidth is certainly suggestive that the unused bandwidth at 16x is likely very small (and possibly already saturated).

Meaning not much room there on 16x for future GPU's.

Regarding GPGPU, not sure how much actual data transfer (versus crunching) is involved in F@H distributed computing type apps, but I imagine the most data transfer intensive app here would be transcoding where lots of data needs to be streamed across the PCIe slot.

Could just mean that the disk drive subsystem ends up being the bottleneck I suppose.

Denithor · Jun 23, 2009

If you look at the rest of the benchmarks on the tweaktown review you'll see that the impact is generally considerably lower than seen with Crysis (I sorta cherrypicked the worst case to illustrate my point). But the effect definitely gets worse the higher in resolution you go - meaning that at lower res/no AA you haven't yet saturated the 8 PCIe lanes. Once you hit that point though the performance takes a significant hit as data has to be "scheduled" through the pipes and the GPU cores cannot run at full efficiency (some down time waiting for packets - or something to that effect).

Now - compare those results to what we see of C2D/C2Q versus i7 in the second benchmark. The [lower latency or higher bandwidth - which is it?] connecting the CPU cores to the PCIe lanes and then to the GPU cores has a simply enormous impact on throughput and therefore fps. GPUs stay fed, crank through their work and fps is much much higher.

Then consider that with i5 the CPU cores are basically in direct contact with the PCIe lanes. Should make for even more efficient transfer of data from CPU into GPU cores - but the question remains - will smoother data flow make up for the fact you've only got 8 lanes per GPU core or not?

On F@H - after thinking more about it - most WUs are <10MB in size so I seriously doubt there are any kind of bandwidth issues. The entire WU is loaded into the GPU core for processing and then when complete transfered back out to be submitted.

Really going OT here but anyway - I don't recall ever hearing that SSD or ramdisk speeds up F@H output so I doubt that one is really limited by anything other than the shader speed of the GPU. Does SSD/ramdisk have any impact on video encoding speed?

ilkhan · Jun 23, 2009

Originally posted by: taltamir
well, the question is... is it easier add bandwidth to a northbridge/cpu hybrid (that is based on a design that had more bandwidth).
Or is it easier to redesign a northbridge to have some southbridge functionality while still communicating with a second discreet southbridge.

You still haven't shown anything to make me think the 36(32?) lanes on the X58 north bridge are precluded from being used for something besides graphics. They're general purpose PCI-E lanes, can be used for anything.

I am talking about situations where the socket can not be changed at all. Anything that would require a new revision of chips to confuse consumers is out. They could make X68 a 2 chip solution (that requires discrete graphics either AiB or on motherboard) that uses QPI instead of DMI. It'd be fairly simple, actually. But with QPI they'd have a massive bandwidth advantage vs DMI.
I doubt they would choose to do this, as having the seperate northbridge allows for it to be used repeatedly in xeon platforms (AKA 2-4 X58 PCI-E bridges on the 8 way designs). Duplicating the SB functionality would be kind've dumb for server applications.

I dont think they could replace the DMI bus with a QPI but on s1156 without a new line of CPUs. But they could remove the third chip on s1366 without a new line.

ilkhan · Jun 24, 2009

no comment?

lopri · Jun 24, 2009

I'm going to write an email to TweakTown with regard to that X48 vs P45 CrossFire article. They should re-run the test with better BIOS and drivers. Everyone quotes that single article as a 'proof' of PCIe saturation, despite dozens other articles and user reports.

I have spoken with AT editors, NVIDIA employee (whose main testing is SLI), and other knowledgeable people @B3D.
I read dozen other articles as well as user reports that show nothing meaningful when it comes to x8 vs x16. (PCIe Gen 2.0)
I tested with my own hardware.

They all disagree with TweakTown, yet it seems like that article is the most quoted article when it comes to this debate. Is there any other article that I can consult to?

Edit: I spoke without reading carefully. I thought it was yet another x8/x8 SLI vs x16/x16 SLI debate.

Denithor · Jun 24, 2009

Wow, didn't mean to kick an anthill.

:laugh:

Either way should be the same, correct? Dual is dual, whether CF or SLI - if the bandwidth of x8 restricts the flow to the GPU performance will suffer. From that article it definitely only happens at high resolution and generally also with AA/AF enabled - which would make sense as that's the point of peak demand for bandwidth.

I want to see what happens when you repeat this test and include X58 and P55 boards in the mix. The X58 shows large improvements in performance which obviously have to come from somewhere - lower latency between CPU and GPU cores, more efficient transfer of work into the GPU cores (optimized order of packets or something?) or something else altogether.

Like I said above - we just have to see how much the direct-to-PCIe interface of Lynnfield will improve things. Enough to overcome any potential bandwidth inadequacy, perhaps?

Idontcare · Jun 24, 2009

Food for thought for ilkhan.

ilkhan · Jun 24, 2009

Very interesting IDC, since it does show QPI to the second die.

Im still curious as to the actual bandwidth via DMI between the CPU and the PCH. The x4/x2 notation is confusing.

taltamir · Jun 24, 2009

from the roadmaps it seems intel engineers agree with me. thanks for the link idontcare
ikhan, I think the debate ran its course, there is nothing for me to add.

Denithor · Jun 26, 2009

x4/x2 probably refers to downstream/upstream bandwidth or something similar.

Idontcare · Jun 26, 2009

Originally posted by: ilkhan
Very interesting IDC, since it does show QPI to the second die.

Im still curious as to the actual bandwidth via DMI between the CPU and the PCH. The x4/x2 notation is confusing.

Yep, since they are keeping the PCIe controller on 45nm process tech, as well as the memory controller, it continues to stand to reason to assume that lynnfield will internally communicate with the integrated PCIe controller via QPI. It would make no sense for Intel to re-engineer this interface twice over for two successive products.

As for x4/x2...I may be misinterpreting the root of your question but this just mean x4 as in PCIe bandwidth notation. PCIe x16 bandwidth for graphics, PCIe x4 or x2 bandwidth for DMI purposes.

Or perhaps you know that but you are asking why two options - x4 and x2 - and what is the intended product difference for the two?

I won't pretend to know much of anything about the design decisions that are going on here, only Intel's product managers can lay claim to such information, but I am willing to join you in your efforts to logic out the dilemma if one exists.

ilkhan · Jun 26, 2009

Actually, looking at http://en.wikipedia.org/wiki/Direct_Media_Interface it seems theres a better explanation.
DMI is a PCI-E v1.1 x4 link. But it can be cut down to 2 PCI-E lanes if they want to go cheaper. So 1GB/s or 512MB/s of bandwidth vs 25.6GB/s of bandwidth for QPI, for QPI.
42 data pin for x4 vs 80 for QPI could be one reason they went DMI. Very mature tech vs fairly new tech. :shrug:
edit: oh hey, there was a page 4 before I posted.

edit2: the lynnfield preview article shows 2GB/s numbers for it's DMI. Maybe they moved to PCI-E v2 for P55.

Search

Intel's new branding mess...

Idontcare

Elite Member

Denithor

Diamond Member

ilkhan

Golden Member

ilkhan

Golden Member

lopri

Elite Member

Denithor

Diamond Member

Idontcare

Elite Member

ilkhan

Golden Member

taltamir

Lifer

Denithor

Diamond Member

Idontcare

Elite Member

ilkhan

Golden Member

TRENDING THREADS