AMD Introduces Industrys Most Powerful Server Graphics Card

tviceman · Nov 14, 2012

Hahah wtf there is code built into semiaccurate's articles now that won't allow for highlighting to copy and paste. You have to stop the page from loading before it finishes.

The most pertinent question surrounding Intels new Xeon Phi is not what it can do, but what it does to the competition. In that sense, the card is a death blow to Nvidias HPC aspirations.

In the end, GPGPUs are a work in progress. That is the smiley happy term for the English term, Bloody mess.

The net result is a lot of smoke and mirrors for the money in and the meagre results out.

The usable performance of Phi is laughably higher than any GPU out there, it isnt even a close race. No, it isnt even a race, it is a clean kill.

The difference between Phi and GPGPU is astounding. The hardware is a bit light on raw performance, barely over a TeraFLOP DP while the competition is notably higher. SP FP is a far more lopsided win for the GPU set, they all will crunch multiple times what a Phi can. That said, for a given amount of programmer hours, it would be surprising if you didnt get a better result from a cluster of servers with the Intel cards plugged in than any competing GPU based solution.

With Phi on the market, expect new projects that choose Cuda and Nvidia hardware to wither very quickly. Projects that are already heavily invested in the Nvidia solutions will be unlikely to drop them cold, but if they have maintained an x86 code base, Phis overwhelmingly lower cost of code maintenance, updates, and optimization for the next generation may very well win the day. The difference is really that extreme when you run the numbers with any realistic programmer costs factored in.

The end times for GPGPU is here, pity its purveyors, Intel doesnt take prisoners.

sontin · Nov 14, 2012

So now we don't need all these server cpus anymore because we can easily run x86 code on Phi? Wow.

Jaydip · Nov 14, 2012

Sounds weird, from his site I got the impression that he doesn't like Intel either.But enemy of my enemy I guess.

Keysplayr · Nov 14, 2012

Charlie would sooner lick the CEO of Intel's boot heel before praising anything Nvidia. Or any other company in existence for that matter. hehe.

boxleitnerb · Nov 14, 2012

Ah that, I have read that. It makes kinda sense, though.

tviceman · Nov 14, 2012

boxleitnerb said:
Ah that, I have read that. It makes kinda sense, though.

Unless, of course, he is overstating how "easy" it will be to drop and plop xeon phi cards without a change in coding habits, or over exaggerating the difficulty curve in learning GPGPU, or skewing hardware utilization results into best case scenarios for Intel and worst case for Nvidia, and thinks that companies spending $30,000,000 on the best possible hardware available won't want to spend another $300,000 training their employees.

Granseth · Nov 14, 2012

http://www.green500.org/lists/green201211

After reading from some people here how bad inefficient AMDs cards are earlier in this thread it's interesting to look at reality. Look at #2 and #4.

#1 and # 3 are also interesting of course, it's interesting that reality is not as straight forward as one would believe.

And it's strange seeing that FirePro is combined with Intel CPUs and nVidia with Opterons.

boxleitnerb · Nov 14, 2012

Is that even comparable regarding only the GPUs? Different CPUs and likely different infrastructures (network, storage, RAM capacity) can heavily influence the results.

Granseth · Nov 14, 2012

I'm pretty sure it's not very comparable, but it would point to Intels and AMDs solutions being pretty efficient too.

boxleitnerb · Nov 14, 2012

How can you say that if it is not comparable? We don't even know how the computing power is split between CPUs and GPUs in those systems. And what about code optimization? There are too many unanswered questions to draw any conclusions there. But the specs are pretty clear. The K20(X) solutions are king when it comes to DP energy efficiency.

Granseth · Nov 14, 2012

How can you write that when Linpack is run with DP and K20 don't dominate the list.
As I was writing I'm pretty shure that the systems are not very comparable, but it's also easy to see that K20 don't do it better in this test. But as you say, maybe AMD has better programmers to optimize code for Linpack, or maybe AMD's S10000 are a good product too.

sontin · Nov 14, 2012

You can't really compare the systems in the green500 list to each other. Titan is on the third place and uses over 8 megawatts. The S10000 only 179 kilowatts.

And there is another factor: Price of the components. The two K20X systems have one Opteron CPU per GPU. The AMD System has two or four Xeon processors per card and the AMD cards only deliver 56% of the rPeak performance instead of 90% of the K20X systems.

Granseth · Nov 14, 2012

You really don't read what I write, K20 is at spot #4 and using 129 kW so it's not that far off 179 kW.
And I have no problem seeing that its not directly comparable systems, but as I was saying, S10000 delivers good results too. Or is it the Xenons that are so good at DP that they should skip the GPGPU part?

ShintaiDK · Nov 14, 2012

Granseth said:
You really don't read what I write, K20 is at spot #4 and using 129 kW so it's not that far off 179 kW.
And I have no problem seeing that its not directly comparable systems, but as I was saying, S10000 delivers good results too. Or is it the Xenons that are so good at DP that they should skip the GPGPU part?

The list you show combines the entire datacenter usage and then calculate it after flops.

And that ruins it all.

Plus at said, CPU per GPU etc confs are different. Some may use more system memory, HD storage etc. Some might have older cooling systems, others the latest high efficiect ones. One datacenter is highly insulated, another aint etc.

sontin · Nov 14, 2012

The only think i take away from Todi and Titan is that the combination of 1x Opteron and 1xK20X can scale up to 27 Petaflops and has nearly the same Perf/Watt.

It is much easier to build a very efficient system with less performance than with high performance.

Granseth · Nov 14, 2012

ShintaiDK said:
The list you show combines the entire datacenter usage and then calculate it after flops.

And that ruins it all.

Plus at said, CPU per GPU etc confs are different. Some may use more system memory, HD storage etc. Some might have older cooling systems, others the latest high efficiect ones. One datacenter is highly insulated, another aint etc.

All the top 4 datacenters are completly new. And I am shure there are differences, but I am only saying that S10000 delivers good results too when it comes to efficiency. And I only brought it up because some people started to bash the S10000 efficiency in the start of this thread. I really can't understand why people are trying to find so many excuses to how this cant be true.

sontin · Nov 14, 2012

Because you compare a system where the K20X cards are responsibly for 90% of the rPeak performance against a system where the S10000 cards brings only 56% and need 4 cpus per card instead of one.

Titan has 14x more cores but delivering 40x more performances in linpack. With only 56% of the rPeak performance a huge amount of the perf/watt comes directly from the Xeon processor.

bunnyfubbles · Nov 14, 2012

AnandThenMan said:
Huh?

I think he means level the playing field much like how Little Boy leveled Hiroshima.

Vesku · Nov 14, 2012

Regarding efficiency, sure there are a lot of caveats but the most efficient Xeon only systems hover around the 1000 MFLOPS/W range and all of the top heterogeneous systems are in the 2000 MFLOPS/W range. Even given the unknowns it's pretty clear that the S10000 is floating around the same efficiency as the others.

wlee15 · Nov 14, 2012

sontin said:
Because you compare a system where the K20X cards are responsibly for 90% of the rPeak performance against a system where the S10000 cards brings only 56% and need 4 cpus per card instead of one.

Titan has 14x more cores but delivering 40x more performances in linpack. With only 56% of the rPeak performance a huge amount of the perf/watt comes directly from the Xeon processor.

There's seems to be an error on the TOP500 chart. According to this press release from the Frankfurt Institute for Advanced Studies the SANAM supercomputer consists of 210 ESC4000/FDR ASUS G2 servers each with 2 E5-2650 cpu and 2 S10000 cards. So the S10000 provide about 91.5% of the rPeak.

Technically, the German-Arab supercomputer is a cluster of standard computer servers with a high-speed network. The cluster consists of 210 servers with 3,360 cores, 840 GPUs and 26,880 gigabytes of main memory. The server type ESC4000/FDR ASUS G2 is equipped with two Intel Xeon E5-2650 processors and 16 gigabytes of eight modules (128 GB) fitted the energy-efficient "Samsung Green Memory" components. Each server contains two graphics cards AMD FirePro Model S10000 with four graphics processors to accelerate. Wherein said network is an FDR InfiniBand network with a transmission capacity of 56 Gbit / s The servers were supplied by the company Adtech Global.

http://translate.google.com/transla...ias.uni-frankfurt.de/press121114.html&act=url

f1sherman · Nov 14, 2012

Funny thing is that Nvidia also mentions Saudi's KAUST as Tesla customer.

http://nvidianews.nvidia.com/Releas...rs-Powers-World-s-No-1-Supercomputer-8b6.aspx

What gives?

sontin · Nov 15, 2012

Vesku said:
Regarding efficiency, sure there are a lot of caveats but the most efficient Xeon only systems hover around the 1000 MFLOPS/W range and all of the top heterogeneous systems are in the 2000 MFLOPS/W range. Even given the unknowns it's pretty clear that the S10000 is floating around the same efficiency as the others.

The first 16C Opteron System without a accelerator lands at 582.00 MFLOPs/Watt. That is only 60% of the Intel system...

The first system with Xeons and Fermi cards is the HA-PACS with 1,035.13 MFLOPs/Watt. This system has [FONT=tahoma, arial, helvetica, sans-serif]1072 GPUs and 536 8Core CPUs. The K20X has the same TDP as the 2090 if they would change the 2090 to K20X it would deliver a rPeak performance of 1.493,320[/FONT] TFLOPs/s instead of 802 TFLOPs/s and it would bump up the perf/watt between 2380 MFLOPs/watt (Linpack efficiency of 64,8% ) and 2555 MFLOPs/watt (Linpack efficiency of 69,4% ).

Interesting is also that you need 60% more server with the S10000 to get the same performance. You can only put two S10000 cards into these server system instead of 4 K20(X). And one S10000 card is only delivering 1 TFLOPs/s in DGEMM like the K20(X) now.

3DVagabond · Nov 15, 2012

sontin said:
You can only put two S10000 cards into these server system instead of 4 K20(X).

Why?

Edit: Here's 8 in a single server.

Are you thinking of crossfire/sli limitations, maybe?

sontin · Nov 15, 2012

That's a custom one. I'm talking about the server systems you get from OEMs like Asus:
http://www.asus.com/Server_Workstation/Servers/ESC4000FDR_G2/#overview

The S10000 supercomputer is only using two cards per system instead of 4.

Silverforce11 · Nov 15, 2012

Whats interesting here is people would even go with Firepro for large server clusters... i guess AMD is doing better than i gave them credit for.

AMD Introduces Industrys Most Powerful Server Graphics Card

Diamond Member

Diamond Member

Diamond Member

Elite Member

Platinum Member

Diamond Member

Senior member

Platinum Member

Senior member

Platinum Member

Senior member

Diamond Member

Senior member

Lifer

Diamond Member

Senior member

Diamond Member

Lifer

Diamond Member

Senior member

Platinum Member

Diamond Member

Lifer

Diamond Member

Lifer

AMD Introduces Industrys Most Powerful Server Graphics Card