Why is SNB-E's memory bandwidth worse than SNB's?

Accord99

Platinum Member
Jul 2, 2001
2,259
172
106

frostedflakes

Diamond Member
Mar 1, 2005
7,925
1
81
They explain it right there in the article.

The results on the diagram seem like a paradox. Not only the dual-channel Sandy Bridge memory controller shows higher practical bandwidth readings and lower actual latency, but the Sandy Bridge-E platform doesn’t get any memory performance improvement upon increase in the number of memory channels. However, there is no mistake here and the selected benchmark works perfectly fine. The thing is that in this case they use a single-threaded algorithm, which doesn’t let the advantages of the Sandy Bridge-E memory controller to come through. In case of simple single-threaded memory requests a quad-channel memory controller will not be any better than a dual-channel one.

stream.png
 

Diogenes2

Platinum Member
Jul 26, 2001
2,151
0
0
It's best at what is was designed for.
Its original optimization for servers and high-performance workstations obviously determines in which cases its strengths show their best.
In which case it's almost double the performance of SB ... ( LGA1155 )
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
That explains why the ramdisk numbers were so much lower than expected then.

One of those tradeoffs in taking a server xeon and labeling it an "Extreme" processor.

Can't have your cake and eat it too.
 
Mar 10, 2006
11,715
2,012
126
That explains why the ramdisk numbers were so much lower than expected then.

One of those tradeoffs in taking a server xeon and labeling it an "Extreme" processor.

Can't have your cake and eat it too.

Shouldn't a server Xeon be...well, better than the client stuff? Like gulftown vs nehalem?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Shouldn't a server Xeon be...well, better than the client stuff? Like gulftown vs nehalem?

Better at what?

;)

Better at server apps? Absolutely.

Better at client apps? Depends on how much the client app behaves like a server app.

See bulldozer.
 
Mar 10, 2006
11,715
2,012
126
Better at what?

;)

Better at server apps? Absolutely.

Better at client apps? Depends on how much the client app behaves like a server app.

See bulldozer.

Bulldozer's not even that good at server apps, is it? But I guess it does perform *better* in server apps...

I kind of just expected 6-8 cores of SNB + more advanced system architecture would be unequivocally better than a chip designed for laptops at...everything...

But hey! I guess we have competition in the desktop space -- Intel v.s. Intel :p
 

GammaLaser

Member
May 31, 2011
173
0
0
Server-grade chips need to support features that might end up being unnecessary overhead when used for client-type applications. For example, multi-socket support, extra error checking/RAS, scalability for 4+ cores on a chip, all don't come for free in terms of performance. In this case the uncore parts became complex enough to require a new clocking design and that introduced latency penalties to get to the L3.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
http://www.realworldtech.com/page.cfm?ArticleID=RWT072811020122

Apparently SNB-E's LLC clock is ~1-1.5GHz? Doesn't that mean it's WAY slower than a consumer SNB chip?

No they should run at similar frequency. I guess its possible the ringbus is 2x wider and only need half the frequency to achieve the same bandwidth, but purely looking at bandwidth they are roughly the same: http://www.xbitlabs.com/articles/cpu/display/core-i7-3960x-3930k_7.html#sect0

The increased latency is due to the bigger size, other than that it seems identical.

Using Aida64, which is a single threaded benchmark, 2700K achieves 32GB/s and 3960X achieves 29GB/s.

Using Sandra, a multi-threaded benchmark, it beats the 2600K every step of the way: http://techreport.com/articles.x/21987/5

Shouldn't a server Xeon be...well, better than the client stuff? Like gulftown vs nehalem?
Also as GammaLaser and Idontcare kindly pointed it out, the changes in server side chips may hamper client side performance. Like the increased memory capacity support usually means lower frequency and looser timings.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Can someone with a 3930K/3960X setup run Sisoftware Sandra cache benchmark? I want to compare with my 2600K to show the L3 cache isn't clocked lower.

I'll start for the 2600K.

L3 On-Board Cache: 132.68GB/s(or 33GB/s per core)

-It can be any recent version of Sandra
-You don't need to read graphs. It tells you at the bottom in text
-For easier comparison, it would be good to have it at stock
 
Last edited:

Diogenes2

Platinum Member
Jul 26, 2001
2,151
0
0
That was stock 3.2 = 3.5GZ turbo, as you requested..

At 4.3 I get 302GB/s (50.3GB/s per core )

Let me know if I'm not understanding what you are looking for..
 

Ben90

Platinum Member
Jun 14, 2009
2,866
3
0
I kind of just expected 6-8 cores of SNB + more advanced system architecture would be unequivocally better than a chip designed for laptops at...everything...
If you think about it, we haven't had an unequivocally better processor in a very long time.

SNB-E Vs SNB -> Single Threaded Memory Bandwidth
SNB Vs Gulftown -> Cores/Chipset
Gulftown Vs Nehalem -> L3$ Latency
Nehalem Vs Penyrn -> L2 Cache Performance
Penyrn Vs Kentsfield -> Unequivocally Better Possibly?
 

grkM3

Golden Member
Jul 29, 2011
1,407
0
0
Those first sandy e's are 8 core xeons with 2 cores turned off and were ment for server memory.im willing to bet that the new 4 core sandy e's are going to perform better than there 6 core first stepping versions.
 

exar333

Diamond Member
Feb 7, 2004
8,518
8
91
If you think about it, we haven't had an unequivocally better processor in a very long time.

SNB-E Vs SNB -> Single Threaded Memory Bandwidth
SNB Vs Gulftown -> Cores/Chipset
Gulftown Vs Nehalem -> L3$ Latency
Nehalem Vs Penyrn -> L2 Cache Performance
Penyrn Vs Kentsfield -> Unequivocally Better Possibly?

I am pretty sue Nehalem was 100% a better processor than Penryn. That was a pretty BIG jump in performance at lower mhz even.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
That was stock 3.2 = 3.5GZ turbo, as you requested..

At 4.3 I get 302GB/s (50.3GB/s per core )

Let me know if I'm not understanding what you are looking for..

Wait a minute. If I divide 247 by 8, I get 31GB/s, which is closer to the number 2600K gets. That's also similar to the number it gets in Aida64.

I think that means while Sandy Bridge E chips have only 6 cores enabled, the L3 cache works just as if 8 cores are enabled.

Hmm, so your results point to SNB-E's cache is *faster* than SNB's.

It looks like the bandwidth may be slightly less on the SNB-E core, due to higher latency, because its larger. But cumulatively, its faster.
 
Last edited:

Diogenes2

Platinum Member
Jul 26, 2001
2,151
0
0
I think that means while Sandy Bridge E chips have only 6 cores enabled, the L3 cache works just as if 8 cores are enabled.
I don't believe that is true .. It seems there was a discussion about that a while back, or maybe covered in the Anand review.

The two extra core are fused, why would any cache be allocated to them ?
 
Mar 10, 2006
11,715
2,012
126
I don't believe that is true .. It seems there was a discussion about that a while back, or maybe covered in the Anand review.

The two extra core are fused, why would any cache be allocated to them ?

Maybe the ring stops are still there?
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Once I get my 3820 from Newegg (coming Wednesday) I will post some numbers with the retail stepping.