• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Why is SNB-E's memory bandwidth worse than SNB's?

They explain it right there in the article.

The results on the diagram seem like a paradox. Not only the dual-channel Sandy Bridge memory controller shows higher practical bandwidth readings and lower actual latency, but the Sandy Bridge-E platform doesn’t get any memory performance improvement upon increase in the number of memory channels. However, there is no mistake here and the selected benchmark works perfectly fine. The thing is that in this case they use a single-threaded algorithm, which doesn’t let the advantages of the Sandy Bridge-E memory controller to come through. In case of simple single-threaded memory requests a quad-channel memory controller will not be any better than a dual-channel one.

stream.png
 
It's best at what is was designed for.
Its original optimization for servers and high-performance workstations obviously determines in which cases its strengths show their best.
In which case it's almost double the performance of SB ... ( LGA1155 )
 
That explains why the ramdisk numbers were so much lower than expected then.

One of those tradeoffs in taking a server xeon and labeling it an "Extreme" processor.

Can't have your cake and eat it too.
 
That explains why the ramdisk numbers were so much lower than expected then.

One of those tradeoffs in taking a server xeon and labeling it an "Extreme" processor.

Can't have your cake and eat it too.

Shouldn't a server Xeon be...well, better than the client stuff? Like gulftown vs nehalem?
 
Shouldn't a server Xeon be...well, better than the client stuff? Like gulftown vs nehalem?

Better at what?

😉

Better at server apps? Absolutely.

Better at client apps? Depends on how much the client app behaves like a server app.

See bulldozer.
 
Better at what?

😉

Better at server apps? Absolutely.

Better at client apps? Depends on how much the client app behaves like a server app.

See bulldozer.

Bulldozer's not even that good at server apps, is it? But I guess it does perform *better* in server apps...

I kind of just expected 6-8 cores of SNB + more advanced system architecture would be unequivocally better than a chip designed for laptops at...everything...

But hey! I guess we have competition in the desktop space -- Intel v.s. Intel 😛
 
Server-grade chips need to support features that might end up being unnecessary overhead when used for client-type applications. For example, multi-socket support, extra error checking/RAS, scalability for 4+ cores on a chip, all don't come for free in terms of performance. In this case the uncore parts became complex enough to require a new clocking design and that introduced latency penalties to get to the L3.
 
http://www.realworldtech.com/page.cfm?ArticleID=RWT072811020122

Apparently SNB-E's LLC clock is ~1-1.5GHz? Doesn't that mean it's WAY slower than a consumer SNB chip?

No they should run at similar frequency. I guess its possible the ringbus is 2x wider and only need half the frequency to achieve the same bandwidth, but purely looking at bandwidth they are roughly the same: http://www.xbitlabs.com/articles/cpu/display/core-i7-3960x-3930k_7.html#sect0

The increased latency is due to the bigger size, other than that it seems identical.

Using Aida64, which is a single threaded benchmark, 2700K achieves 32GB/s and 3960X achieves 29GB/s.

Using Sandra, a multi-threaded benchmark, it beats the 2600K every step of the way: http://techreport.com/articles.x/21987/5

Shouldn't a server Xeon be...well, better than the client stuff? Like gulftown vs nehalem?
Also as GammaLaser and Idontcare kindly pointed it out, the changes in server side chips may hamper client side performance. Like the increased memory capacity support usually means lower frequency and looser timings.
 
Last edited:
Can someone with a 3930K/3960X setup run Sisoftware Sandra cache benchmark? I want to compare with my 2600K to show the L3 cache isn't clocked lower.

I'll start for the 2600K.

L3 On-Board Cache: 132.68GB/s(or 33GB/s per core)

-It can be any recent version of Sandra
-You don't need to read graphs. It tells you at the bottom in text
-For easier comparison, it would be good to have it at stock
 
Last edited:
That was stock 3.2 = 3.5GZ turbo, as you requested..

At 4.3 I get 302GB/s (50.3GB/s per core )

Let me know if I'm not understanding what you are looking for..
 
I kind of just expected 6-8 cores of SNB + more advanced system architecture would be unequivocally better than a chip designed for laptops at...everything...
If you think about it, we haven't had an unequivocally better processor in a very long time.

SNB-E Vs SNB -> Single Threaded Memory Bandwidth
SNB Vs Gulftown -> Cores/Chipset
Gulftown Vs Nehalem -> L3$ Latency
Nehalem Vs Penyrn -> L2 Cache Performance
Penyrn Vs Kentsfield -> Unequivocally Better Possibly?
 
Those first sandy e's are 8 core xeons with 2 cores turned off and were ment for server memory.im willing to bet that the new 4 core sandy e's are going to perform better than there 6 core first stepping versions.
 
If you think about it, we haven't had an unequivocally better processor in a very long time.

SNB-E Vs SNB -> Single Threaded Memory Bandwidth
SNB Vs Gulftown -> Cores/Chipset
Gulftown Vs Nehalem -> L3$ Latency
Nehalem Vs Penyrn -> L2 Cache Performance
Penyrn Vs Kentsfield -> Unequivocally Better Possibly?

I am pretty sue Nehalem was 100% a better processor than Penryn. That was a pretty BIG jump in performance at lower mhz even.
 
That was stock 3.2 = 3.5GZ turbo, as you requested..

At 4.3 I get 302GB/s (50.3GB/s per core )

Let me know if I'm not understanding what you are looking for..

Wait a minute. If I divide 247 by 8, I get 31GB/s, which is closer to the number 2600K gets. That's also similar to the number it gets in Aida64.

I think that means while Sandy Bridge E chips have only 6 cores enabled, the L3 cache works just as if 8 cores are enabled.

Hmm, so your results point to SNB-E's cache is *faster* than SNB's.

It looks like the bandwidth may be slightly less on the SNB-E core, due to higher latency, because its larger. But cumulatively, its faster.
 
Last edited:
I think that means while Sandy Bridge E chips have only 6 cores enabled, the L3 cache works just as if 8 cores are enabled.
I don't believe that is true .. It seems there was a discussion about that a while back, or maybe covered in the Anand review.

The two extra core are fused, why would any cache be allocated to them ?
 
Once I get my 3820 from Newegg (coming Wednesday) I will post some numbers with the retail stepping.
 
Back
Top