AMD Introduces Industry’s Most Powerful Server Graphics Card

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
Hahah wtf there is code built into semiaccurate's articles now that won't allow for highlighting to copy and paste. You have to stop the page from loading before it finishes.

The most pertinent question surrounding Intel’s new Xeon Phi is not what it can do, but what it does to the competition. In that sense, the card is a death blow to Nvidia’s HPC aspirations.

In the end, GPGPUs are a work in progress. That is the smiley happy term for the English term, “Bloody mess”.

The net result is a lot of smoke and mirrors for the money in and the meagre results out.

The usable performance of Phi is laughably higher than any GPU out there, it isn’t even a close race. No, it isn’t even a race, it is a clean kill.

The difference between Phi and GPGPU is astounding. The hardware is a bit light on raw performance, barely over a TeraFLOP DP while the competition is notably higher. SP FP is a far more lopsided win for the GPU set, they all will crunch multiple times what a Phi can. That said, for a given amount of programmer hours, it would be surprising if you didn’t get a better result from a cluster of servers with the Intel cards plugged in than any competing GPU based solution.

With Phi on the market, expect new projects that choose Cuda and Nvidia hardware to wither very quickly. Projects that are already heavily invested in the Nvidia solutions will be unlikely to drop them cold, but if they have maintained an x86 code base, Phi’s overwhelmingly lower cost of code maintenance, updates, and optimization for the next generation may very well win the day. The difference is really that extreme when you run the numbers with any realistic programmer costs factored in.

The end times for GPGPU is here, pity it’s purveyors, Intel doesn’t take prisoners.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
So now we don't need all these server cpus anymore because we can easily run x86 code on Phi? Wow.
 

Jaydip

Diamond Member
Mar 29, 2010
3,691
21
81
Sounds weird, from his site I got the impression that he doesn't like Intel either.But enemy of my enemy I guess.
 

Keysplayr

Elite Member
Jan 16, 2003
21,209
50
91
Charlie would sooner lick the CEO of Intel's boot heel before praising anything Nvidia. Or any other company in existence for that matter. hehe.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
Ah that, I have read that. It makes kinda sense, though.

Unless, of course, he is overstating how "easy" it will be to drop and plop xeon phi cards without a change in coding habits, or over exaggerating the difficulty curve in learning GPGPU, or skewing hardware utilization results into best case scenarios for Intel and worst case for Nvidia, and thinks that companies spending $30,000,000 on the best possible hardware available won't want to spend another $300,000 training their employees.
 

Granseth

Senior member
May 6, 2009
258
0
71
http://www.green500.org/lists/green201211

After reading from some people here how bad inefficient AMDs cards are earlier in this thread it's interesting to look at reality. Look at #2 and #4.

#1 and # 3 are also interesting of course, it's interesting that reality is not as straight forward as one would believe.

And it's strange seeing that FirePro is combined with Intel CPUs and nVidia with Opterons.
 

boxleitnerb

Platinum Member
Nov 1, 2011
2,601
2
81
Is that even comparable regarding only the GPUs? Different CPUs and likely different infrastructures (network, storage, RAM capacity) can heavily influence the results.
 

Granseth

Senior member
May 6, 2009
258
0
71
I'm pretty sure it's not very comparable, but it would point to Intels and AMDs solutions being pretty efficient too.
 

boxleitnerb

Platinum Member
Nov 1, 2011
2,601
2
81
How can you say that if it is not comparable? We don't even know how the computing power is split between CPUs and GPUs in those systems. And what about code optimization? There are too many unanswered questions to draw any conclusions there. But the specs are pretty clear. The K20(X) solutions are king when it comes to DP energy efficiency.
 
Last edited:

Granseth

Senior member
May 6, 2009
258
0
71
How can you write that when Linpack is run with DP and K20 don't dominate the list.
As I was writing I'm pretty shure that the systems are not very comparable, but it's also easy to see that K20 don't do it better in this test. But as you say, maybe AMD has better programmers to optimize code for Linpack, or maybe AMD's S10000 are a good product too.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
You can't really compare the systems in the green500 list to each other. Titan is on the third place and uses over 8 megawatts. The S10000 only 179 kilowatts.

And there is another factor: Price of the components. The two K20X systems have one Opteron CPU per GPU. The AMD System has two or four Xeon processors per card and the AMD cards only deliver 56% of the rPeak performance instead of 90% of the K20X systems.
 

Granseth

Senior member
May 6, 2009
258
0
71
You really don't read what I write, K20 is at spot #4 and using 129 kW so it's not that far off 179 kW.
And I have no problem seeing that its not directly comparable systems, but as I was saying, S10000 delivers good results too. Or is it the Xenons that are so good at DP that they should skip the GPGPU part?
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
You really don't read what I write, K20 is at spot #4 and using 129 kW so it's not that far off 179 kW.
And I have no problem seeing that its not directly comparable systems, but as I was saying, S10000 delivers good results too. Or is it the Xenons that are so good at DP that they should skip the GPGPU part?

The list you show combines the entire datacenter usage and then calculate it after flops.

And that ruins it all.

Plus at said, CPU per GPU etc confs are different. Some may use more system memory, HD storage etc. Some might have older cooling systems, others the latest high efficiect ones. One datacenter is highly insulated, another aint etc.
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
The only think i take away from Todi and Titan is that the combination of 1x Opteron and 1xK20X can scale up to 27 Petaflops and has nearly the same Perf/Watt.

It is much easier to build a very efficient system with less performance than with high performance.
 

Granseth

Senior member
May 6, 2009
258
0
71
The list you show combines the entire datacenter usage and then calculate it after flops.

And that ruins it all.

Plus at said, CPU per GPU etc confs are different. Some may use more system memory, HD storage etc. Some might have older cooling systems, others the latest high efficiect ones. One datacenter is highly insulated, another aint etc.
All the top 4 datacenters are completly new. And I am shure there are differences, but I am only saying that S10000 delivers good results too when it comes to efficiency. And I only brought it up because some people started to bash the S10000 efficiency in the start of this thread. I really can't understand why people are trying to find so many excuses to how this cant be true.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Because you compare a system where the K20X cards are responsibly for 90% of the rPeak performance against a system where the S10000 cards brings only 56% and need 4 cpus per card instead of one.

Titan has 14x more cores but delivering 40x more performances in linpack. With only 56% of the rPeak performance a huge amount of the perf/watt comes directly from the Xeon processor.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Regarding efficiency, sure there are a lot of caveats but the most efficient Xeon only systems hover around the 1000 MFLOPS/W range and all of the top heterogeneous systems are in the 2000 MFLOPS/W range. Even given the unknowns it's pretty clear that the S10000 is floating around the same efficiency as the others.
 

wlee15

Senior member
Jan 7, 2009
313
31
91
Because you compare a system where the K20X cards are responsibly for 90% of the rPeak performance against a system where the S10000 cards brings only 56% and need 4 cpus per card instead of one.

Titan has 14x more cores but delivering 40x more performances in linpack. With only 56% of the rPeak performance a huge amount of the perf/watt comes directly from the Xeon processor.

There's seems to be an error on the TOP500 chart. According to this press release from the Frankfurt Institute for Advanced Studies the SANAM supercomputer consists of 210 ESC4000/FDR ASUS G2 servers each with 2 E5-2650 cpu and 2 S10000 cards. So the S10000 provide about 91.5% of the rPeak.

Technically, the German-Arab supercomputer is a cluster of standard computer servers with a high-speed network. The cluster consists of 210 servers with 3,360 cores, 840 GPUs and 26,880 gigabytes of main memory. The server type ESC4000/FDR ASUS G2 is equipped with two Intel Xeon E5-2650 processors and 16 gigabytes of eight modules (128 GB) fitted the energy-efficient "Samsung Green Memory" components. Each server contains two graphics cards AMD FirePro Model S10000 with four graphics processors to accelerate. Wherein said network is an FDR InfiniBand network with a transmission capacity of 56 Gbit / s The servers were supplied by the company Adtech Global.

http://translate.google.com/transla...ias.uni-frankfurt.de/press121114.html&act=url
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Regarding efficiency, sure there are a lot of caveats but the most efficient Xeon only systems hover around the 1000 MFLOPS/W range and all of the top heterogeneous systems are in the 2000 MFLOPS/W range. Even given the unknowns it's pretty clear that the S10000 is floating around the same efficiency as the others.

The first 16C Opteron System without a accelerator lands at 582.00 MFLOPs/Watt. That is only 60% of the Intel system...

The first system with Xeons and Fermi cards is the HA-PACS with 1,035.13 MFLOPs/Watt. This system has [FONT=tahoma, arial, helvetica, sans-serif]1072 GPUs and 536 8Core CPUs. The K20X has the same TDP as the 2090 if they would change the 2090 to K20X it would deliver a rPeak performance of 1.493,320[/FONT] TFLOPs/s instead of 802 TFLOPs/s and it would bump up the perf/watt between 2380 MFLOPs/watt (Linpack efficiency of 64,8% ) and 2555 MFLOPs/watt (Linpack efficiency of 69,4% ).

Interesting is also that you need 60% more server with the S10000 to get the same performance. You can only put two S10000 cards into these server system instead of 4 K20(X). And one S10000 card is only delivering 1 TFLOPs/s in DGEMM like the K20(X) now.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
You can only put two S10000 cards into these server system instead of 4 K20(X).

Why?

Edit: Here's 8 in a single server.
16-GPUs-straight-on-view.jpg


Are you thinking of crossfire/sli limitations, maybe?
 
Last edited:
Feb 19, 2009
10,457
10
76
Whats interesting here is people would even go with Firepro for large server clusters... i guess AMD is doing better than i gave them credit for.