Intel vs. AMD and memory controller performance

bxm

Junior Member
May 4, 2010
2
0
0
Hi all.
I have a question about the performance of the on-die memory controller on the nehalem based and barcelona, shanghai, istanbul, magny cours architecture. Is the performance gap in stream, everest, sandra etc ... only due to the memory controller even after the balancing memory resources (same number of memory channels (lynnfield), frequency and memory timing) or is there something else important. The reason to ask this question is that my work is concerning about the HPC computing with the memory demanding applications (calculations of electronic structure in molecules and solids), where we see the strong correlation between the performance and the memory throughput. Is really Nehalem based systems so good for the memory demanding applications? And the reason is the excellent on-die memory controller on Intel side or AMD cannot utilize all memory resources?
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Amd is selling their cpus at a loss in the lowend . Their top top performer is under $300. In order to really improve memory they have to change sockets . Which is an AMD selling point fact is its the only good selling point they have . They are also aware because certain organizations are looking into the cost of AMD M/Bs comparred to intels . People are finely seeing something is really wrong here bigtime. As Intel out sells AMD 5 to 1. Yet the low volumn parts from AMD sell cheaper than than Intels high volumn parts . This goes against everthing we know about volumn pricing. Check out AMDs chipset pricies . Than check out intels chipset pricies. Its just a matter of time and we will nail the M/B makers for this injustice. AMD only needs to come with a new socket and we will have every M/B maker standing befor the law.
 

Gikaseixas

Platinum Member
Jul 1, 2004
2,836
218
106
Those guys from Abu Dhabi must have deep pockets to be supporting a lost cause
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Nemesis, really now, what does anything in your post have to do with the OP's request?

OP - It is rare to encounter a fellow computational chemist on the tech forums, welcome to the AT forums :thumbsup:

Basically what you (and I) are interested in is SPECfp_rate2006 for a given platform (and more likely SPECfp_rate2006/$ since computational chemists are rarely wealthy).

I recommend you hit the spec.org, specifically the CFP2006 section and note the specific test benchmarks that relate to computational chemistry (416.gamess, 435.gromacs, 444.namd, and 465.tonto).

Netx you want to peruse the benchmark results submitted to the site and do a little data-mining based on the platform generated results of these specific benchmarks.

Here is an example of some preliminary data-mining.

The short answer to your question is that YES, Nehalem's strong FPU combined with superior 3-channel bandwidth results in ridiculously great performance with computational chemistry applications.

BUT your specific software application, combined with the latest CPU+platform costs, dictates which platform will provide you the best performance/$ so you need to do a little more work to arrive at an answer.

Good luck!
 

VirtualLarry

No Lifer
Aug 25, 2001
56,570
10,204
126
As far as memory bandwidth goes, Magny-Cours is better than Nehalem, but Intel's Xeon 7500 series is faster still.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
As far as memory bandwidth goes, Magny-Cours is better than Nehalem, but Intel's Xeon 7500 series is faster still.

I don't care he was talking about the controllers. He already seen Intels had higher through put. To catch up to intel AMD needs new socket .

Mag cores is not a cheap alternative to intel at all and really doesn't compete. In the desktop. But for servers it appears to be a fine set of 6x dual cores that are glued together.
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Those guys from Abu Dhabi must have deep pockets to be supporting a lost cause

Well now that AMD has no interest in GF . It will be interesting to see if all costumers are treated = . If AMD gets better deals look out, The GF fabs are operating at almost full capacity yet they are losing money . Ya it smells alot. Now when the New york fab comes online thats great for GF but not so great for AMD. As GF has to treat all customers the same or go out of business. In other words GF has to show a profit which = Amd core pricies will be higher priced
 

piesquared

Golden Member
Oct 16, 2006
1,651
473
136
@nemesis:

Yes but doesn't intel need more letters in its name? And given the fact that if you rearange the letters you could spell nile, even if you do have to drop the t, it is possible that the oil slick blowout could be relieved with a horizontal relief well. In fact it it would be relatively straightforward. And don't forget that the density of fresh water is 971 kg/m^3! Given that, if it takes a chicken and a half a day and a half to lay an egg and a half, don't you think its possible for an ant with a wooden leg to kick the seeds out of a dill pickle? Looks like it rained last night.
 

JFAMD

Senior member
May 16, 2009
565
0
0
As far as memory bandwidth goes, Magny-Cours is better than Nehalem, but Intel's Xeon 7500 series is faster still.

Please show me the SPEC FP or STREAM scores that prove this out.

MC = 4 channels @ 1333MHz
7500 = 4 channels @ 1066MHz

The math just can't ever work in their favor.
 

JFAMD

Senior member
May 16, 2009
565
0
0
I don't care he was talking about the controllers. He already seen Intels had higher through put. To catch up to intel AMD needs new socket .

Mag cores is not a cheap alternative to intel at all and really doesn't compete. In the desktop. But for servers it appears to be a fine set of 6x dual cores that are glued together.

A couple of points:

1. The original poster asked about comparisons of server architecture, not desktop
2. In servers, AMD has a decided lead in memory bandwidth. Look at STREAM, look at SPEC_FP.
3. Magny Cours is not 6 dual cores glued together

You're not adding to the conversation.
 

JFAMD

Senior member
May 16, 2009
565
0
0
Here is a memory bandwidth comparison:

STREAM_2P.jpg


You'll note that we have not seen a public Westmere STREAM benchmark. But because the memory support is identical to Nehalem (3 channels @ 1333MHz) the throughput should be close to the same, maybe a percentage point or two higher.

You'll note that we do not have a beckton score either. The 4 channel would theoretically give higher bandwidth than Westmere, but because their max memory is only 1066, they are probably going to score ~20% or so slower than Magny Cours.

Plus Beckton (Xeon 7500) has buffers that add latency.

If you want memory Bandwidth the Opteron 6100 series is the way to go.
 

bxm

Junior Member
May 4, 2010
2
0
0
Here is a memory bandwidth comparison:

STREAM_2P.jpg


You'll note that we have not seen a public Westmere STREAM benchmark. But because the memory support is identical to Nehalem (3 channels @ 1333MHz) the throughput should be close to the same, maybe a percentage point or two higher.

You'll note that we do not have a beckton score either. The 4 channel would theoretically give higher bandwidth than Westmere, but because their max memory is only 1066, they are probably going to score ~20% or so slower than Magny Cours.

Plus Beckton (Xeon 7500) has buffers that add latency.

If you want memory Bandwidth the Opteron 6100 series is the way to go.

This comparison looks impressive. I'm surprised because when I read "The Nehalem EX Xeon 7500" review and "AMD's 12-core Magny-Cours" review here on Anandtech, I had to completely jump over the "Understanding the Performance Numbers" chapter, because the triad chart exactly shows what you say about the memory performance. So I think AMD did IT again. For me the beckton performance looks like a bad yoke in comparison with X5670 or 6174. And one another note maybe also for anandtech. There also exists mpi stream test suite (or just to compile the stream with mpi compiler and run with mpirun -np number of cores in node). Maybe it would be useful to run also this tests to measure the performance for single mpi task on full node (intel maybe HT enabled or disabled) in comparison to the single thread (one process per node) and the multithread (full socket or full node) results.
 
Last edited:

extra

Golden Member
Dec 18, 1999
1,947
7
81
@nemesis:

Yes but doesn't intel need more letters in its name? And given the fact that if you rearange the letters you could spell nile, even if you do have to drop the t, it is possible that the oil slick blowout could be relieved with a horizontal relief well. In fact it it would be relatively straightforward. And don't forget that the density of fresh water is 971 kg/m^3! Given that, if it takes a chicken and a half a day and a half to lay an egg and a half, don't you think its possible for an ant with a wooden leg to kick the seeds out of a dill pickle? Looks like it rained last night.


Hahahahahahahaha. Rofl. +1 good sir.

And meh. On the desktop I think the intel tripple channel has the most bandwidth but I'm pretty sure that magny-cours wtfpwns the intel nehalem based xeons when it comes to memory bandwidth. I think there was an anandtech article that talked about this not too long ago. May want to find it. It showed that the best chip highly depended on the apps you ran ... I'm no server guy, but that makes perfect sense--i'd try to go find the article if i wer you.
 

JFAMD

Senior member
May 16, 2009
565
0
0
The best processor always depends on the apps you are running. I always laugh when I see people make a definitive statement that one processor is the best for everything. It is never the case.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,570
10,204
126
Please show me the SPEC FP or STREAM scores that prove this out.

MC = 4 channels @ 1333MHz
7500 = 4 channels @ 1066MHz

The math just can't ever work in their favor.

Stream benchmark for MC: http://www.xtremesystems.org/forums/showpost.php?p=3995220&postcount=54

Stream benchmark for Nehalem: http://www.advancedclustering.com/company-blog/stream-benchmarking.html

Architectural diagram of Xeon 7500: http://www.anandtech.com/show/3648/xeon-7500-dell-r810/3

According to that diagram, the CPU indeed has four memory channels, just like MC. (Well, on 7500, they are serial, not parallel).

But each of those channels goes to a memory buffer chip (serdes), and each of those chips is attached to TWO memory channels.

So while MC has four 64-bit memory channels @ 1333, Xeon 7500 has effectively eight 64-bit memory channels @ 1066. So at least in the theoretical, Xeon 7500 has more memory bandwidth. Whether or not that is borne out in the real world with Stream hasn't been proven yet.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
So while MC has four 64-bit memory channels @ 1333, Xeon 7500 has effectively eight 64-bit memory channels @ 1066. So at least in the theoretical, Xeon 7500 has more memory bandwidth. Whether or not that is borne out in the real world with Stream hasn't been proven yet.

The Xeon 7500 is top to bottom optimized for high end database. For what we are talking about it means bad for HPC applications. Magny Cours does have an advantage there that has no meaningful competition whatsoever.

The possible reason for higher bandwidth on Nehalem(also Westmere and Magny Cours) compared to Shanghai, and certain Istanbul is not only a better memory controller, but because they all feature snoop filters. Nehalem and Westmere does that on the L3 cache, Magny Cours and 4P Istanbul does that too, which they call it HT assist.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
@nemesis:

Yes but doesn't intel need more letters in its name? And given the fact that if you rearange the letters you could spell nile, even if you do have to drop the t, it is possible that the oil slick blowout could be relieved with a horizontal relief well. In fact it it would be relatively straightforward. And don't forget that the density of fresh water is 971 kg/m^3! Given that, if it takes a chicken and a half a day and a half to lay an egg and a half, don't you think its possible for an ant with a wooden leg to kick the seeds out of a dill pickle? Looks like it rained last night.

:thumbsup:
 

AdamK47

Lifer
Oct 9, 1999
15,665
3,524
136
I'd have to believe anything would be possible with an ant wearing a wooden leg.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
A couple of points:

1. The original poster asked about comparisons of server architecture, not desktop
2. In servers, AMD has a decided lead in memory bandwidth. Look at STREAM, look at SPEC_FP.
3. Magny Cours is not 6 dual cores glued together

You're not adding to the conversation.

Nor are you .

1. I never said magna was made from 6 dual cores glued together. I said it was 2 x6 glued together . Or are YOU saying its 1 piece fab silly. I never said only real Companies(MEN) own fabs . How many fabs does AMD have now. I did say AMD is selling CPus below company cost. Denial is transparent.

Why does AMD want to compare 6x to 4x . Why do you want to compare Mag to 7500.
He is correct tho . If you take mags out to its capacity 48 cores on 1 board and do the same with intel on a 8s system tell us all about the bandwidth differance . You can't .

Intel does have another cpu its 8 cores . and AMD does infact win some but loses were it counts the most . I will put any 7500 8s system against the Mag 4s system anyday And I can say without knowing who wins . 48 intel cores against 48 mag cores its like a slaughter house. Amd be bleeding all over the place. You can sell only into cheap markets and your losing those . FACT!
I figured you show up here .Were is informal your AMD clone. He hasn't been right about anything in going on 5 years . Your with AMD so we already know your thinking . its behind the times your still in AMDs glory days.


If you want to do proper 7500 compare to magna. Use a 1s system against a 2s intel system thats 12 cores for each system than lets look at those spec numbers . Oh wait we can't do that because than Intel wins. Hands down. Now you had no problem doing this against Intels P4s but things have changed alot since those days. Ya know how scarred intel is . They bring out 2 and 4 core sandies for desktop and mobile parts . Intel has you covered and you dam well know it . Your 12 core system is nice but to compare it to 12 intel cores AMD hasn't the stomach for that. I been waiting for g 34 along time . WE are going to nail the mother board makers to the cross this time around. Low volumn parts selling for cheaper than high volumn parts . Its going to get really ugly.
 
Last edited:

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
This comparison looks impressive. I'm surprised because when I read "The Nehalem EX Xeon 7500" review and "AMD's 12-core Magny-Cours" review here on Anandtech, I had to completely jump over the "Understanding the Performance Numbers" chapter, because the triad chart exactly shows what you say about the memory performance. So I think AMD did IT again. For me the beckton performance looks like a bad yoke in comparison with X5670 or 6174. And one another note maybe also for anandtech. There also exists mpi stream test suite (or just to compile the stream with mpi compiler and run with mpirun -np number of cores in node). Maybe it would be useful to run also this tests to measure the performance for single mpi task on full node (intel maybe HT enabled or disabled) in comparison to the single thread (one process per node) and the multithread (full socket or full node) results.

Lol lok again its not fair comparison, Its loaded to amds advantage. Don't forget what AMD is about . Remember the PH slides befor release . AMDs is years away from reaching those numbers on those slides . This isn't fair comparsion at all.
 

grimpr

Golden Member
Aug 21, 2007
1,095
7
81
lol, always entertaining reading the rumble thoughts of a maniac on medication typing from his underground survivalist bunker somewhere in redneck mainland usa.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Hahahahahahahaha. Rofl. +1 good sir.

And meh. On the desktop I think the intel tripple channel has the most bandwidth but I'm pretty sure that magny-cours wtfpwns the intel nehalem based xeons when it comes to memory bandwidth. I think there was an anandtech article that talked about this not too long ago. May want to find it. It showed that the best chip highly depended on the apps you ran ... I'm no server guy, but that makes perfect sense--i'd try to go find the article if i wer you.

Your correct. Andy also did a compare with intels 8 core unit. The graphs here are a joke and anyone buying into it is a fool. Andy talks about buffer latencey I sure most here grasp the reasons. AMD wants to compare socket against socket which is fine its also cheaper . But compare 12 intel cores against 12 amd cores. For intel thats 2s system for AMD its a 1s system with TWO x6 cpus glued together. That equals 4 memory channels. Which again is fine. But lets do apples to apples . When conroe came out it was onlya 2s system and intel had to go against AMD 4s systems and they won market share . Nothing has really changed. AMD is still behind the eight ball.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
lol, always entertaining reading the rumble thoughts of a maniac on medication typing from his underground survivalist bunker somewhere in redneck mainland usa.

good one lol Ya the storm shelter is exactly that. A storm shelter . When shit starts as long as its natural disasters I won't go inside. What my family does is their business.
 
Last edited:

JFAMD

Senior member
May 16, 2009
565
0
0
Can someone post a 7500 STREAM benchmark then? I don't have access to any.

As for doing a 1P AMD to 2P intel in order to have and "apples to apples" comparison, that idea is fatally flawed.

Why not compare threads? 2 x 12C AMD to 2 x 12T Intel?
Why not compare price? The throughput of a $5K AMD system to the throughput of a $5K intel system?

There are plenty of ways to do comparisons. Most customers buy at the socket level, so socket to socket seems to be the fairest in my mind. The other way folks buy is by budget, so picking a price point seems to be a fair way to do it.

If you want to do my 1P to an Intel 2P, you can do that, but you might want to add price into that comparison just to make it fair.