Thinking about getting dual or quad Opteron...

Knavish

Senior member
May 17, 2002
910
3
81
I've got a "stupidly" parallel scientific computing problem -- a bunch of simulations running custom code with different input parameters. I realize that a Xeon-based system is much faster per core, but they are rather expensive...

Looking at Dell rackmount servers:
R715 2.4Ghz 2x16C Opteron 6278, 128GB ram, Raid 10 (4x500G) $6500
R720 2.0Ghz 2x8C Xeon E5-2650, 128GB ram, Raid 10 (4x500G) $7400
R815 2.4Ghz 4x16C Opteron 6278, 256GB ram, Raid 10 (4x500G) $12000
R820 2.2Ghz 4x8C Xeon E5-4620, 256GB ram, around $19K+

When I get around to talking with a Dell rep, I will definitely ask about Opteron 6300's. I just don't see them on the site tonight.

Is AMD a crazy choice for this application? I really should try to benchmark the code on some kind of representative Intel and AMD CPUs...
 

Sleepingforest

Platinum Member
Nov 18, 2012
2,375
0
76
It somewhat depends on your code, but Anandtech has run a review on the Opteron 6300 series, and compared it to both the old Opterons and Xeons. I've linked to the conclusion, which saysL
All these small steps forward make the AMD Opteron attractive again for the price conscious buyers looking for a virtualization host or an HPC crunching machine. The Opteron machines need more energy to do their job, but once again you get better performance per dollar than Intel's midrange offerings.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
How "stupidly parallel" are we talking here? Distributed computing level of stupidly parallel?

I run a "stupidly parallel" app, basically multiple instances of the same app which are all queued to run simulations using different input parameters, and I went the el'cheapo route of buying myself five identically configured quad-core boxes and programmed the code to basically turn the cluster into a farm.

That gave me a cluster of twenty 3.3GHz cores at around $125/core.

For "stupidly parallel" problems it is tough to beat the COTS (commodity off the shelf) approach, which is why DC came about in the first place. (unless footprint is a serious constraint)
 

NTMBK

Lifer
Nov 14, 2011
10,232
5,013
136
IDC makes a good point. Dell's Vostro 270 MT can come with a 3.2GHz quad-core Core i5. In UK pricing (minus tax) that configuration comes out at £319, whereas the R715 with 2xOpteron 6278 comes out at about £6500. You could have twenty Vostro boxes for that price- that gives you 80 3.2GHz Ivy Bridge cores, vs 32 Piledriver cores @ 2.4GHz. There's no competition, the Vostro will give you better performance/£ up front. And the Vostro is a business machine, so you still get the good Dell business tech support.

EDIT: Also, a cluster is easily scalable if you need more performance in the future. With a single server solution you have to buy another £6000 server to completely replace the existing one in order to get a 25% speed boost. With a cluster, you add 25% more nodes.
 
Last edited:

Rakehellion

Lifer
Jan 15, 2013
12,182
35
91
I've got a "stupidly" parallel scientific computing problem -- a bunch of simulations running custom code

Geforce.

If you're massively parallel and writing your own code, you're probably much better off doing it on the GPU in OpenCL. This might not be feasible if you're well into the project and most of the code is already written, but even a partial rewrite to take advantage of the GPU will give you huge performance gains.
 

NTMBK

Lifer
Nov 14, 2011
10,232
5,013
136
Geforce.

If you're massively parallel and writing your own code, you're probably much better off doing it on the GPU in OpenCL. This might not be feasible if you're well into the project and most of the code is already written, but even a partial rewrite to take advantage of the GPU will give you huge performance gains.

If this is an option for you, take a look at the Titan. It's expensive for a graphics card, but cheap for a Tesla coprocessor- which is basically what it is. The main missing feature is ECC memory, if I remember correctly, but double precision performance is uncrippled (unlike Geforces of the past).
 

jaedaliu

Platinum Member
Feb 25, 2005
2,670
1
81
Some good ideas here.

grad school?

Depending on your advisor's lab space. I would say the cluster's a good approach. Does your department not have a cluster already?
 

LurchFrinky

Senior member
Nov 12, 2003
298
56
91
As others have already said, it depends on your code.
And as a quick check, I would suggest running the code on an Intel system with HT on and off. If your code takes advantage of HyperThreading (some HPC code runs slower with it), then you can probably just ignore the AMD systems.
And as another alternative - just build your own. Pick up a server barebones chassis, throw in cpus, memory, and HDDs, and you're done. The chassis includes mounted motherboard, power supply(s), heatsinks, and cabling, so it will be the easiest build you can do. Depending on the components, you can get equivalent systems for about 2/3 the cost.
 

Rainer

Junior Member
Mar 14, 2013
14
0
0
Pick up a server barebones chassis, throw in cpus, memory, and HDDs, and you're done. The chassis includes mounted motherboard, power supply(s), heatsinks, and cabling, so it will be the easiest build you can do. Depending on the components, you can get equivalent systems for about 2/3 the cost.

Case in point, the Supermicro 4042G-72RF4 quad Opteron barebone will set you back some $2,200. What you then need is RAM, CPUs, Harddisks, that's it. A configuration similar to that of the quad Opteron Dell machine will comfortably stay below $9,000. Considering that it has an LSI 2208 SAS controller on board, that's one nice machine :thumbsup:

Yet another possibility to look into would be Intel's Xeon Phi coprocessor.
 

Knavish

Senior member
May 17, 2002
910
3
81
Some good ideas here.

grad school?

Depending on your advisor's lab space. I would say the cluster's a good approach. Does your department not have a cluster already?

We actually are having some trouble with access (and even uptime!) of the cluster at our disposal, hence the need for our own machine. I am going to check to see if we can just add nodes to the existing cluster. I use to have access to a cluster that gave you priority on the nodes that you purchased, but I don't think the existing system is that convenient.
 

Knavish

Senior member
May 17, 2002
910
3
81
Thanks for the replies. Here are some responses:

Yes it is stupidly parallel as in distributed computing -- running the same code with different initial conditions per instance. The instances do not communicate with each other. I suppose this might be a better scenario for an intel system vs. an AMD system with slower CPUs and a great backbone for interprocess communication.

Dell refurb is a great idea -- I haven't looked there yet.

Buying a set of desktops would save money, but I think we may get some pushback from system admins about increased support requirements for multiple machines.

GPU computing is an interesting idea, but I suspect it would take months to rewrite the code for a GPU. It might be a fun side project, though.
 

Ajay

Lifer
Jan 8, 2001
15,431
7,849
136
Thanks for the replies. Here are some responses:
GPU computing is an interesting idea, but I suspect it would take months to rewrite the code for a GPU. It might be a fun side project, though.

Using Intel Xeon Phi's might be worth looking into just to see if the manpower costs are worth the effort. From what I've read, writing code for the Phi is easier than say a Nvidia K20 (though I've only done CUDA programming, so I'm not speaking from experience on the Phi). 512b registers on those the Phi, with peak DP performance of 1 TFLOP (likely less for actual apps).
 

Rainer

Junior Member
Mar 14, 2013
14
0
0
Using Intel Xeon Phi's might be worth looking into just to see if the manpower costs are worth the effort. From what I've read, writing code for the Phi is easier than say a Nvidia K20

Yes, that's what they say. Allegedly, a port to Xeon Phi might entail just adding a couple of lines, and be done within minutes.

It may or may not be relevant that Xeon Phi runs only under two operating systems: Linux, and Linux. If that's acceptable, it might be worth investing a couple hours to look into it.
 

Ajay

Lifer
Jan 8, 2001
15,431
7,849
136
Yes, that's what they say. Allegedly, a port to Xeon Phi might entail just adding a couple of lines, and be done within minutes.

It may or may not be relevant that Xeon Phi runs only under two operating systems: Linux, and Linux. If that's acceptable, it might be worth investing a couple hours to look into it.

Looking over at the Xeon Phi pages; it seems like Phi needs additional development libraries (something like Thrust, BLAS, Performance primitives, etc. - unless I was just looking in the wrong places). It would be nice to do host coding w/Visual Studio - but since Intel was shooting for the same code base to work on the Host & Phi, I can see why they chose the path they did. Kinda of meh for me, but only because I haven't done Linux development in a while.
 

NTMBK

Lifer
Nov 14, 2011
10,232
5,013
136
Looking over at the Xeon Phi pages; it seems like Phi needs additional development libraries (something like Thrust, BLAS, Performance primitives, etc. - unless I was just looking in the wrong places). It would be nice to do host coding w/Visual Studio - but since Intel was shooting for the same code base to work on the Host & Phi, I can see why they chose the path they did. Kinda of meh for me, but only because I haven't done Linux development in a while.

Word on the grapevine is that Windows development for Phi is coming later this year, through Intel Composer integrated with Visual Studio.
 

Ajay

Lifer
Jan 8, 2001
15,431
7,849
136
Word on the grapevine is that Windows development for Phi is coming later this year, through Intel Composer integrated with Visual Studio.

Cool, wish I could play with one of them :cool:
 

SocketF

Senior member
Jun 2, 2006
236
0
71
Do you have access to the source code?

If yes, I would assume the AMD would be (much) better, because they can use FMA, that doubles the theoretical computing power of the AMD CPUs.

Another question is if you code can benefit of FMA, but if you use lots of matrix-multiplications (most scientific apps do imo) it will.