Xeon Phi

Khato

Golden Member
Jul 15, 2001
1,206
250
136
Looks like Larrabee finally got some branding today, but the more interesting tidbit from the articles is the fact that there's already a system with it in the recently released June 2012 TOP500 list. 'Discovery' places at 150, supposedly with 236 Xeon E5-2670 and 118 Xeon Phi producing an rmax of 118.6 TFlops at 100.8 kW (no good way to link it directly as the system page doesn't have the TFlops figures.) Of course there are some slight peculiarities with the entry, namely if the 9800 core count was accurate then each Xeon Phi would have roughly 67.05 cores - I'm guessing the 9800 core figure is a ballpark figure as Intel apparently doesn't want to reveal actual core counts yet.

Other interesting observations are that 118.6 TFlops for 100.8 kW puts it in the same ballpark as systems with the same E5-2670 and Tesla M2090 cards get for perforance per watt (1.17 TFlops/kW vs around 1.05 TFlops/kW for the systems with Tesla.) As well, we can derive the theoretical TFlops per Xeon Phi as the E5-2670 is a known ~166 GFlops per CPU - (180990 GFlops - 166 GFlops * 236) / 118 = 1201.8 GFlops per Xeon Phi. That's the figure that you can compare to the theoretical 665 GFLops of a Tesla M2090 or 1024 GFlops of a FirePro W9000. We can also estimate the rmax contribution for each Xeon Phi thanks to a number of entries that are just straight E5-2670 - each E5-2670 gets around 155 GFlops, so (118600 GFlops - 155 GFlops * 236) / 118 = 695 GFlops per Xeon Phi. While this doesn't seem that impressive, keep in mind that the rmax of a Tesla M2090 is roughly 315 GFlops.
 

alyarb

Platinum Member
Jan 25, 2009
2,444
0
76
Larrabee always excelled at DP. I just couldn't understand why they would try to cultivate interest in the product by masquerading as a GPU.
 

Khato

Golden Member
Jul 15, 2001
1,206
250
136
And its easily codeable x86 cores.

This is indeed the most important point. The NVIDIA Tesla solutions have been out for awhile now, but I only count 50 systems in the TOP500 list stating they have a NVIDIA component. Whereas there are already 33 systems that use only the recently released Xeon E5 series. I have no idea how they compare in cost, but the pure Xeon systems do have a slightly lower rmax TFlops/kW compared to those with Tesla M2090's - 0.75 to 0.85 TFlops/kW instead of around 1.05 TFlops/kW - so why else would they be so widely deployed?

Another interesting point regarding the supposedly superior energy efficiency of GPU-based computing - the BlueGene systems at the top of the list are at 2.07 TFlops/kW.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
The Register gives their opinion regarding system config: http://www.theregister.co.uk/2012/06/18/intel_mic_xeon_phi_cray/print.html

If you play around with some numbers (and El Reg can't resist) and assume you have two Knight Ferry coprocessors per server node with 54 cores activated and two Xeon E5-2670s, you can get 9,796 cores across 79 server nodes. That would be 158 teraflops of raw peak Linpack performance from each MIC card, and another 26.3 teraflops peak from the 1,264 Xeon cores.
(Switch Knights Ferry with Knights Corner)

54 seems good since the maximum is 64 cores and it won't be able to activate all cores for yield/redundancy purposes. It's also a prototype device, so they might have needed to deactivate more cores.
 

denev2004

Member
Dec 3, 2011
105
1
0
Well this time you may should compare with GK110......

BTW, it seems that Xeon Phi has a quite good efficiency(65%, greater than ALL NVIDIA-accelerated HPC, which are at nearly 55%), with a performance / power greater than 1000 (Only some 2090 and Power BQC HPC reach this point)
 

Khato

Golden Member
Jul 15, 2001
1,206
250
136
The Register gives their opinion regarding system config: http://www.theregister.co.uk/2012/06/18/intel_mic_xeon_phi_cray/print.html

(Switch Knights Ferry with Knights Corner)

54 seems good since the maximum is 64 cores and it won't be able to activate all cores for yield/redundancy purposes. It's also a prototype device, so they might have needed to deactivate more cores.

Good find - the other article I read claimed the 118 Xeon Phi + 236 Xeon E5 2670 and I just assumed it to be correct. I guess they were basing it off the 1TFlop per Xeon Phi. Regardless, all that really matters here is that the system as a whole is roughly where it should be in terms of performance/watt.
 

denev2004

Member
Dec 3, 2011
105
1
0
Larrabee always excelled at DP. I just couldn't understand why they would try to cultivate interest in the product by masquerading as a GPU.
A real GPU requires something more.
Especially for LRB's SW rendering method - only tex use FFU, while there's a lot FFU in Fermi/Kepler/SI
Maybe the don't want to do it now but they should in the future
 

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
Larrabee always excelled at DP. I just couldn't understand why they would try to cultivate interest in the product by masquerading as a GPU.
Unless this die is tiny as hell, 1 TFLOP FP64 is awful, considering Intel's a process generation ahead. I'm sure MIC offers some other features that make it stand out, but from a pure numbers standpoint, there's nothing special about this.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Unless this die is tiny as hell, 1 TFLOP FP64 is awful, considering Intel's a process generation ahead. I'm sure MIC offers some other features that make it stand out, but from a pure numbers standpoint, there's nothing special about this.

That's actually quite close to what Nvidia is claiming they'll reach with GK110 late this year/early next year. And it may well be on par on a final product.

Flops is just one metric of performance. AMD does pretty well in theoretical Flops, but its Nvidia that dominates in HPC. Same was true in gaming. 2.72TFlops 6970 loses out to the 1.5TFlops GTX 580. There's also more than performance, like compiler and ecosystem support.

There's a good chance though initial Xeon Phi is not looking to take the performance lead, but be competitive in performance/watt and performance/$. Its similar to what they are doing with Smartphones.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
That's actually quite close to what Nvidia is claiming they'll reach with GK110 late this year/early next year. And it may well be on par on a final product.

Flops is just one metric of performance. AMD does pretty well in theoretical Flops, but its Nvidia that dominates in HPC. Same was true in gaming. 2.72TFlops 6970 loses out to the 1.5TFlops GTX 580. There's also more than performance, like compiler and ecosystem support.

There's a good chance though initial Xeon Phi is not looking to take the performance lead, but be competitive in performance/watt and performance/$. Its similar to what they are doing with Smartphones.
See, but they're still an entire node ahead. That's a huge lead.

Yes, software support is everything -- as far as AMD and Nvidia goes, that gap is quickly closing.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
The only news I seen in all the links I have read that was unknown was the OS. These systems will have there own OS . Thats interesting and I hope its stand alone. Ananda found it interesting in the haswell article Intel mentions differrant OS . Anand found that to be odd . Its now beginning to look interesting . Can't wait for fall IDF
 
Last edited:

alyarb

Platinum Member
Jan 25, 2009
2,444
0
76
See, but they're still an entire node ahead. That's a huge lead.

Not really. intel's 22nm process was tuned to minimize voltage for 17-watt and smaller chips, not to maximize yields for 550 sq mm, 300-watt coprocessors. As such they are disabling what, 5 to 10 cores per chip right now? intel is never going to discuss yields but we have already gotten plenty of clues to suggest that they are not great, so why should we expect more from this rather ambitious chip?

Likewise TSMC has cancelled 22nm in favor of a 20nm process which seems to favor mobile SOC vendors more than gargantuan ASICs such as GK112. Both nVIDIA and intel are making do with limited fab space because these super-chips, while interesting, are not killer products you'll find in every store, and are justifiably limited in what manufacturing resources and personnel can be allocated to them.

If intel and TSMC put all their fab space into *just* these two products, then you might see intel's more advanced node make a difference, but these are niche products that are never going to see the kind of volume to prove what you are inferring.

Prototypes are, with few exceptions, represented in the most optimistic light possible. The very limited quantity of Tesla and MIC prototypes and demos should be able to illustrate exactly how tough these chips are to build in certain numbers, particularly when the prototypes themselves are gimped. I wouldn't compare these devices from a manufacturing standpoint until we can get GK112 and KC side by side with no execution hardware disabled. Who knows when that will be.
 
Last edited:

Khato

Golden Member
Jul 15, 2001
1,206
250
136

Since the fact they couldn't arrive at an exact 9800 cores as the TOP500 listing specified was bugging me... It should be exact. Assuming 2 Xeon E5-2670 per node there are two combinations which add up to an exact 9800 cores while keeping the number of cores per Xeon Phi above 50 and below 64. Namely 140 nodes with a single 54 core Xeon Phi per node: 140*(2*8+54) = 140*70 = 9800. The other is 70 nodes with two 62 core Xeon Phi per node: 70*(2*8+2*62) = 70*140 = 9800.

Using those two correct possibilities to replicate my earlier calculations yields either 692 GFlops ((118600-140*155)/140) with two Xeon Phi per node or 1074 GFlops ((118600-280*155)/70) with one Xeon Phi per node. Unfortunately I'm not aware of any further information to tip the balance towards one configuration or the other. Both work with their previous demonstrations. If we assume the source in that Register article is correct and it is significantly less than 100 nodes, then we'd have to go with 62 cores per Xeon Phi and 2 cards per node... which seems like it might be cutting it a bit close if there are 64 cores in the design.
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Not really. intel's 22nm process was tuned to minimize voltage for 17-watt and smaller chips, not to maximize yields for 550 sq mm, 300-watt coprocessors. As such they are disabling what, 5 to 10 cores per chip right now? intel is never going to discuss yields but we have already gotten plenty of clues to suggest that they are not great, so why should we expect more from this rather ambitious chip?

Likewise TSMC has cancelled 22nm in favor of a 20nm process which seems to favor mobile SOC vendors more than gargantuan ASICs such as GK112. Both nVIDIA and intel are making do with limited fab space because these super-chips, while interesting, are not killer products you'll find in every store, and are justifiably limited in what manufacturing resources and personnel can be allocated to them.

If intel and TSMC put all their fab space into *just* these two products, then you might see intel's more advanced node make a difference, but these are niche products that are never going to see the kind of volume to prove what you are inferring.

Prototypes are, with few exceptions, represented in the most optimistic light possible. The very limited quantity of Tesla and MIC prototypes and demos should be able to illustrate exactly how tough these chips are to build in certain numbers, particularly when the prototypes themselves are gimped. I wouldn't compare these devices from a manufacturing standpoint until we can get GK112 and KC side by side with no execution hardware disabled. Who knows when that will be.

Were are you getting all this wonderful information on Intels PHi. 300watts seems really high considering haswell will have xeon that are 160 watt. But I have never seen the 300watt figure from intel on PHi . Now your saying that 5 to 10 are disabled and yields are low/ Please you have written alot here . Weres the link with this made up info on it. Intel doesn't say how many cores are on chip yet you pull them from thin air . Your magical.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
See, but they're still an entire node ahead. That's a huge lead.

See, we don't know that. The devil is always in the details. It could go in any way. Xeon Phi being significantly slower/on par/significantly faster.

But if you ask me, whether one generation has significant advantage over another is dependent on how well they execute and how realistic their own goals are. I believe both Intel and Nvidia are pretty well aware of what each other can offer in a general sense.