Why is SNB-E's memory bandwidth worse than SNB's?

Arachnotronic · Feb 19, 2012

http://www.xbitlabs.com/articles/memory/display/lga2011-ddr3_2.html

What's going on?!

Also, why does SNB-E's L3 cache have 20% higher latency? Aren't the SRAM cells on SNB-E the same as those on SNB?

Accord99 · Feb 19, 2012

Intel17 said:
http://www.xbitlabs.com/articles/memory/display/lga2011-ddr3_2.html

What's going on?!

Perhaps it's just AIDA. Sandra and Stream show the expected increase in bandwidth.

Also, why does SNB-E's L3 cache have 20% higher latency? Aren't the SRAM cells on SNB-E the same as those on SNB?

SNB's L3 cache runs at CPU clock, while the SNB-E's L3 and ring bus runs at a different, slower speed.

frostedflakes · Feb 19, 2012

They explain it right there in the article.

The results on the diagram seem like a paradox. Not only the dual-channel Sandy Bridge memory controller shows higher practical bandwidth readings and lower actual latency, but the Sandy Bridge-E platform doesn’t get any memory performance improvement upon increase in the number of memory channels. However, there is no mistake here and the selected benchmark works perfectly fine. The thing is that in this case they use a single-threaded algorithm, which doesn’t let the advantages of the Sandy Bridge-E memory controller to come through. In case of simple single-threaded memory requests a quad-channel memory controller will not be any better than a dual-channel one.

Arachnotronic · Feb 19, 2012

frostedflakes said:
They explain it right there in the article.

But why would adding more channels force the designers to slow down single threaded memory access latencies?

Also, wait...L3 on SNB-E doesn't run at core clock? What does it run at?

Arachnotronic · Feb 19, 2012

http://www.realworldtech.com/page.cfm?ArticleID=RWT072811020122

Apparently SNB-E's LLC clock is ~1-1.5GHz? Doesn't that mean it's WAY slower than a consumer SNB chip?

Diogenes2 · Feb 19, 2012

It's best at what is was designed for.

Its original optimization for servers and high-performance workstations obviously determines in which cases its strengths show their best.

In which case it's almost double the performance of SB ... ( LGA1155 )

Idontcare · Feb 19, 2012

That explains why the ramdisk numbers were so much lower than expected then.

One of those tradeoffs in taking a server xeon and labeling it an "Extreme" processor.

Can't have your cake and eat it too.

Arachnotronic · Feb 19, 2012

Idontcare said:
That explains why the ramdisk numbers were so much lower than expected then.

One of those tradeoffs in taking a server xeon and labeling it an "Extreme" processor.

Can't have your cake and eat it too.

Shouldn't a server Xeon be...well, better than the client stuff? Like gulftown vs nehalem?

Idontcare · Feb 19, 2012

Intel17 said:
Shouldn't a server Xeon be...well, better than the client stuff? Like gulftown vs nehalem?

Better at what?

😉

Better at server apps? Absolutely.

Better at client apps? Depends on how much the client app behaves like a server app.

See bulldozer.

Arachnotronic · Feb 19, 2012

Idontcare said:
Better at what?

😉

Better at server apps? Absolutely.

Better at client apps? Depends on how much the client app behaves like a server app.

See bulldozer.

Bulldozer's not even that good at server apps, is it? But I guess it does perform *better* in server apps...

I kind of just expected 6-8 cores of SNB + more advanced system architecture would be unequivocally better than a chip designed for laptops at...everything...

But hey! I guess we have competition in the desktop space -- Intel v.s. Intel 😛

rgallant · Feb 19, 2012

frostedflakes said:
They explain it right there in the article.

+1 for your sig.

GammaLaser · Feb 19, 2012

Server-grade chips need to support features that might end up being unnecessary overhead when used for client-type applications. For example, multi-socket support, extra error checking/RAS, scalability for 4+ cores on a chip, all don't come for free in terms of performance. In this case the uncore parts became complex enough to require a new clocking design and that introduced latency penalties to get to the L3.

IntelUser2000 · Feb 19, 2012

Intel17 said:
http://www.realworldtech.com/page.cfm?ArticleID=RWT072811020122

Apparently SNB-E's LLC clock is ~1-1.5GHz? Doesn't that mean it's WAY slower than a consumer SNB chip?

No they should run at similar frequency. I guess its possible the ringbus is 2x wider and only need half the frequency to achieve the same bandwidth, but purely looking at bandwidth they are roughly the same: http://www.xbitlabs.com/articles/cpu/display/core-i7-3960x-3930k_7.html#sect0

The increased latency is due to the bigger size, other than that it seems identical.

Using Aida64, which is a single threaded benchmark, 2700K achieves 32GB/s and 3960X achieves 29GB/s.

Using Sandra, a multi-threaded benchmark, it beats the 2600K every step of the way: http://techreport.com/articles.x/21987/5

Shouldn't a server Xeon be...well, better than the client stuff? Like gulftown vs nehalem?

Also as GammaLaser and Idontcare kindly pointed it out, the changes in server side chips may hamper client side performance. Like the increased memory capacity support usually means lower frequency and looser timings.

IntelUser2000 · Feb 19, 2012

Can someone with a 3930K/3960X setup run Sisoftware Sandra cache benchmark? I want to compare with my 2600K to show the L3 cache isn't clocked lower.

I'll start for the 2600K.

L3 On-Board Cache: 132.68GB/s(or 33GB/s per core)

-It can be any recent version of Sandra
-You don't need to read graphs. It tells you at the bottom in text
-For easier comparison, it would be good to have it at stock

Diogenes2 · Feb 20, 2012

247.15GB/s ( 41GB/s per core )

IntelUser2000 · Feb 20, 2012

Diogenes2 said:
247.15GB/s ( 41GB/s per core )

(41/33) x 3.5GHz(Turbo for 4 core on 2600K) = 4.3GHz

Is your CPU running at 4.3GHz?

Diogenes2 · Feb 20, 2012

That was stock 3.2 = 3.5GZ turbo, as you requested..

At 4.3 I get 302GB/s (50.3GB/s per core )

Let me know if I'm not understanding what you are looking for..

Ben90 · Feb 20, 2012

Intel17 said:
I kind of just expected 6-8 cores of SNB + more advanced system architecture would be unequivocally better than a chip designed for laptops at...everything...

If you think about it, we haven't had an unequivocally better processor in a very long time.

SNB-E Vs SNB -> Single Threaded Memory Bandwidth
SNB Vs Gulftown -> Cores/Chipset
Gulftown Vs Nehalem -> L3$ Latency
Nehalem Vs Penyrn -> L2 Cache Performance
Penyrn Vs Kentsfield -> Unequivocally Better Possibly?

Arachnotronic · Feb 20, 2012

Diogenes2 said:
That was stock 3.2 = 3.5GZ turbo, as you requested..

At 4.3 I get 302GB/s (50.3GB/s per core )

Let me know if I'm not understanding what you are looking for..

Hmm, so your results point to SNB-E's cache is *faster* than SNB's.

grkM3 · Feb 20, 2012

Those first sandy e's are 8 core xeons with 2 cores turned off and were ment for server memory.im willing to bet that the new 4 core sandy e's are going to perform better than there 6 core first stepping versions.

exar333 · Feb 20, 2012

Ben90 said:
If you think about it, we haven't had an unequivocally better processor in a very long time.

SNB-E Vs SNB -> Single Threaded Memory Bandwidth
SNB Vs Gulftown -> Cores/Chipset
Gulftown Vs Nehalem -> L3$ Latency
Nehalem Vs Penyrn -> L2 Cache Performance
Penyrn Vs Kentsfield -> Unequivocally Better Possibly?

I am pretty sue Nehalem was 100% a better processor than Penryn. That was a pretty BIG jump in performance at lower mhz even.

IntelUser2000 · Feb 20, 2012

Diogenes2 said:
That was stock 3.2 = 3.5GZ turbo, as you requested..

At 4.3 I get 302GB/s (50.3GB/s per core )

Let me know if I'm not understanding what you are looking for..

Wait a minute. If I divide 247 by 8, I get 31GB/s, which is closer to the number 2600K gets. That's also similar to the number it gets in Aida64.

I think that means while Sandy Bridge E chips have only 6 cores enabled, the L3 cache works just as if 8 cores are enabled.

Hmm, so your results point to SNB-E's cache is *faster* than SNB's.

It looks like the bandwidth may be slightly less on the SNB-E core, due to higher latency, because its larger. But cumulatively, its faster.

Diogenes2 · Feb 20, 2012

I think that means while Sandy Bridge E chips have only 6 cores enabled, the L3 cache works just as if 8 cores are enabled.

I don't believe that is true .. It seems there was a discussion about that a while back, or maybe covered in the Anand review.

The two extra core are fused, why would any cache be allocated to them ?

Arachnotronic · Feb 20, 2012

Diogenes2 said:
I don't believe that is true .. It seems there was a discussion about that a while back, or maybe covered in the Anand review.

The two extra core are fused, why would any cache be allocated to them ?

Maybe the ring stops are still there?

Edrick · Feb 20, 2012

Once I get my 3820 from Newegg (coming Wednesday) I will post some numbers with the retail stepping.

Why is SNB-E's memory bandwidth worse than SNB's?

Lifer

Platinum Member

Diamond Member

Lifer

Lifer

Platinum Member

Elite Member

Lifer

Elite Member

Lifer

Golden Member

Member

Elite Member

Elite Member

Platinum Member

Elite Member

Platinum Member

Platinum Member

Lifer

Golden Member

Diamond Member

Elite Member

Platinum Member

Lifer

Golden Member