[AT] Radeon Instinct (Vega) HPC card Announced

nathanddrews

Graphics Cards, CPU Moderator
Aug 9, 2016
965
534
136
www.youtube.com
http://www.anandtech.com/show/10905/amd-announces-radeon-instinct-deep-learning-2017/2

Memory Type - "High Bandwidth Cache and Controller"
Memory Bandwidth - ?
Single Precision (FP32) - 12.5 TFLOPS (Tesla P100 = 10.6 TFLOPS)
Half Precision (FP16) - 25 TFLOPS (Tesla P100 = 21.2 TFLOPS)
TDP - <300W
Cooling - Passive

Based on TFLOPS only, that's 18% faster than Tesla P100. Let's hope the consumer GPU winds up as competitive against Titan XP...
 
  • Like
Reactions: Dave2150

ultima_trev

Member
Nov 4, 2015
148
66
66
One can't compare nVidia and AMD in terms of floating point operations alone. Let's assume the IPC scaling from Fiji to Vega is similar to the scaling of Maxwell to Pascal. Then let's compare two similar REAL-WORLD performing SKUs:

First, a GTX 980 at 1330 core as pretty much all 980s boost to this by default, which comes to roughly 5.4 TFLOPS.
Next, the similarly performing R9 Nano which is 8 TFLOPS.

At best, the R9 Nano was about 5-10% faster than GTX 980 despite having a 48% advantage in terms of compute prowess.

Titan X Pascal is good for a minimum of 11 TFLOPS at a minimum boost clock of 1536, and typically boosts much higher. It probably has a real world boost clock of at least 1800 as most higher tier Pascal GPUs boost to the 1800-2000 range, which puts the Titan X Pascal closer to 12.9 TFLOPS. Now assuming same scaling in terms of IPC from one generation to the next, Vega would need to have a TLOPS rating of roughly 19.1 TFLOPs to edge out Titan X Pascal in the way the Nano edge out the GTX 980.

Unless the new version of GCN is a substantial improvement over previous iterations (which I doubt), we are most likely looking at Vega performing smack dab in between the GTX 1070 and 1080. Not necessarily a bad thing especially if it's only priced at about 300$, it will make for a very desirable 1440P@High card although the 4K gaming market will be forfeit to the GTX 1080, 1080 Ti and Titan Xp.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
Actually, Vega is likely to be a very large departure from GCN. It is believed "NCU" stands for New CU, or Next CU, or Neural CU, and it's likely things have changed quite a bit if so.
We've seen some interesting patents that could very well be used in Vega, the guys over on Beyond3D have been talking about it for a while now.

Quite honestly, all bets are off when it comes to Vega, too many unknows to make an educated guess.
 

nathanddrews

Graphics Cards, CPU Moderator
Aug 9, 2016
965
534
136
www.youtube.com
Based on TFLOPS only, that's 18% faster than Tesla P100.
One can't compare nVidia and AMD in terms of floating point operations alone.

Well of course I can, because there is no other officially released information except perhaps this slide deck from AMD which shows a 60% MIOpen speedup, but it's not independently verified so take it with a grain of salt. As for right now, 18% on paper is what we know. I hope that materializes into performance IRL, but we don't know. Your conjecture based upon pre-existing technology is even less relevant given the unknown architectural changes of Vega that we don't yet know anything about. Unknown unknown. If all GCN products were created equal, that would lend more credence to your hypothesis, but so far they have all been very different from one another.

(AMD slide deck for reference)
AMD%20Radeon%20Instinct_Final%20for%20Distribution-page-021_575px.jpg
 

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
Titan X Pascal is good for a minimum of 11 TFLOPS at a minimum boost clock of 1536, and typically boosts much higher. It probably has a real world boost clock of at least 1800 as most higher tier Pascal GPUs boost to the 1800-2000 range, which puts the Titan X Pascal closer to 12.9 TFLOPS.

TPU measured an average boost clock of 1692 MHz for Titan X Pascal.

That would put it at 12.1 TFLOPS.

Of course this is largely an irrelevant comparison since the Titan X card is a gaming card and the MI25 is a HPC card, and generally speaking those two categories are binned quite differently. The proper comparison here would be the P100 or the Quadro P6000, both of which are clocked significantly lower than the Titan X card.
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
I hope both Vegas clean beat GP104 and GP100. Not only would that put pressure on the existing product stack, it would put pressure on future product stacks.

If both or either Vega beat Nvidia's respective chips in perf/w and outright performance and are competitive in price I'll get one without hesitation.
 

Riek

Senior member
Dec 16, 2008
409
14
76
One can't compare nVidia and AMD in terms of floating point operations alone. Let's assume the IPC scaling from Fiji to Vega is similar to the scaling of Maxwell to Pascal. Then let's compare two similar REAL-WORLD performing SKUs:

First, a GTX 980 at 1330 core as pretty much all 980s boost to this by default, which comes to roughly 5.4 TFLOPS.
Next, the similarly performing R9 Nano which is 8 TFLOPS.

At best, the R9 Nano was about 5-10% faster than GTX 980 despite having a 48% advantage in terms of compute prowess.

Titan X Pascal is good for a minimum of 11 TFLOPS at a minimum boost clock of 1536, and typically boosts much higher. It probably has a real world boost clock of at least 1800 as most higher tier Pascal GPUs boost to the 1800-2000 range, which puts the Titan X Pascal closer to 12.9 TFLOPS. Now assuming same scaling in terms of IPC from one generation to the next, Vega would need to have a TLOPS rating of roughly 19.1 TFLOPs to edge out Titan X Pascal in the way the Nano edge out the GTX 980.

Unless the new version of GCN is a substantial improvement over previous iterations (which I doubt), we are most likely looking at Vega performing smack dab in between the GTX 1070 and 1080. Not necessarily a bad thing especially if it's only priced at about 300$, it will make for a very desirable 1440P@High card although the 4K gaming market will be forfeit to the GTX 1080, 1080 Ti and Titan Xp.

Why would scaling of Fiji to Vega be similar as that of maxwell to pascal?
That basis alone is rubish
Maxwell ->Pascal had barely any performance improvement in terms of ipc.
Where we know that tonga to polaris has a substantial performance improvement/clock. So you should start with comparing Polaris architecture to gamble away using GFLops as that is the closest to the vega architecture.
(Comparison from fuji should be done using kepler to Pascal for the same 2 generations skip.. and that is just insane right?)

Polaris has about 5.8TFlop and performance in the facinity (~6-9% TPU) of a 980. Which would mean that Big Polaris with no architectural improvements, using your assumptions would take 14-15TFlop to match Titan X with 12.9TFlop. So yes with no architectural improvements the assumption is that Titan X should be faster. But best guesses (using your logic that is...) would put that hypotical 12.5TFlop Vega card a good 10% above the 1080@1800MHz and not below the 1080.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
The key factor which will decide whether Vega is competitive or not is the perf/sp increase. AMD needs a significant increase in perf/sp due to architectural improvements. If AMD can get 20% increase in perf/sp at same clock over Polaris then they can compete otherwise they don't stand a chance.
 
  • Like
Reactions: Headfoot

Det0x

Golden Member
Sep 11, 2014
1,031
2,963
136
What we know so far:

Vega 10 have 8 gigabyte hbm2
Vega 10 matches, or slightly beats GTX 1080 in Doom and AotS.

What we don't know:

Vega10 haven't been confirmed to be the smaller die of the family, although the rumors seems to suggest so:

techpowerup.com said:
Vega10 will be a multi-chip module, and feature HBM2 memory. The 14 nm architecture will feature higher performance/Watt than even the upcoming "Polaris" architecture. "Vega10" isn't a successor to "Fiji," though. That honor is reserved for "Vega11." It is speculated that Vega10 will feature 4096 stream processors, and will power graphics cards that compete with the GTX 1080 and GTX 1070. Vega11, on the other hand, is expected to feature 6144 stream processors, and could take on the bigger GP100-based SKUs. Both Vega10 and Vega11 will feature 4096-bit HBM2 memory interfaces, but could differ in standard memory sizes (think 8 GB vs. 16 GB).

fudzilla said:
The Vega 10 GPU is rumored to be a smaller chip with up to 4096 Stream Processors and this is the chip that AMD needs in order to compete with Nvidia's new GP104 GPU and Geforce GTX 1080/1070 graphics cards. The Vega 11, is a bigger chip, rumored to come with up to 6144 Stream Processors and compete with Nvidia's future GP100 flagship graphics card.

http://www.fudzilla.com/news/graphics/40662-amd-allegedly-pulls-vega-launch-forward

Guru3d said:
Vega11 - lets call it "Big Vega' which is to replace the FIJI / Fury (X) parts. So as suggested, where Vega10 would get 4096 stream processors and could compete with the GeForce GTX 1080 and GTX 1070 Vega11 on its end would feature 6144 stream processors (=rumor) and would be lined against the GP100/GP102 aka Big Pascal GPU (Titan X). This all is obviously known and discussed many times already. Then there now is some new and intersting info.

http://www.guru3d.com/news-story/amd-vega-10-vega-20-and-vega-11-gpus-mentioned-by-cto.html

isportstimes.com said:
AMD Vega 11 Also Arriving Next Year
There's very few information we know of about AMD Vega 11. However, one thing stands out is that AMD's Vega 11 is reportedly going head-to-head against NVIDIA's reported GP100, which is said to be even more powerful than the GTX 1080 and Titan X.

Another important thing to note is that if Vega 10 is twice as powerful as Polaris 10's Radeon RX 480, Vega 11 is rumored to be thrice as powerful compared to RX 480. Based on the available data, it will reportedly have a 32GB HBM2 memory and with 1TB per second memory bandwidth.

Another rumor about Vega 10/11 is that it will have up to 18 billion transistors, which can dramatically increase the graphic card's performance and efficiency. It is worth noting that GTX 1080 only has 7.2 billion, while Titan X has 12 billion transistors.

http://www.isportstimes.com/article...a-10-release-date-h1-2017-vega-11-q4-2017.htm

So far i would say its looking good for AMD if the rumors are to be believed.

Vega10 to match/beat GTX 1080.
Vega11 to match/beat Pascal Titan / GP102 uncut
 

daxzy

Senior member
Dec 22, 2013
393
77
101
One can't compare nVidia and AMD in terms of floating point operations alone.

First, a GTX 980 at 1330 core as pretty much all 980s boost to this by default, which comes to roughly 5.4 TFLOPS.
Next, the similarly performing R9 Nano which is 8 TFLOPS.

At best, the R9 Nano was about 5-10% faster than GTX 980 despite having a 48% advantage in terms of compute prowess.

Why are you comparing gaming performance with HPC performance? From my understanding, the current Fiji HPC cards (S9300) has been rather well received.
 

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
Vega10 haven't been confirmed to be the smaller die of the family, although the rumors seems to suggest so
Considering it's a 300W card, I find that hard to believe. Even if this consumes more power than its consumer counterpart, there's simply no way AMD would launch a >300W consumer GPU in 2017, no matter its power. Cooling that thing would be an utter nightmare.
 

poofyhairguy

Lifer
Nov 20, 2005
14,612
318
126
Considering it's a 300W card, I find that hard to believe. Even if this consumes more power than its consumer counterpart, there's simply no way AMD would launch a >300W consumer GPU in 2017, no matter its power. Cooling that thing would be an utter nightmare.

They already have one reference water cooled GPU. Why not another?

It would give a reason for gamers to have those massive power supplies. I would take cheap but wasteful in a heartbeat, but I am an American so...
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
I wouldn't doubt that packed 16bit FP instructions actually increases TDP just because you've increased utilization of units that would have earlier gone unused. Basically, if there was a no-packed-fp16 and packed-fp16 variants of the exact same die, the packed-fp16 would logically have a higher TDP in fp16 workloads. How much more I couldn't say, but the same fp16 workload on a fp32 only chip would have half of those units not doing work. With how much powergating goes on today I wouldn't be surprised if this is a big part of that TDP figure.

Efficiency would still likely go up though, since you get 2x theoretical output for less than 2x an increase in TDP
 
  • Like
Reactions: HurleyBird

HurleyBird

Platinum Member
Apr 22, 2003
2,685
1,274
136
I wouldn't doubt that packed 16bit FP instructions actually increases TDP just because you've increased utilization of units that would have earlier gone unused.

That's a good point that I hadn't considered. It makes sense that 25 half precision TFLOPs could consume more power than 12.5 single precision TFLOPS.
 

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
I wouldn't doubt that packed 16bit FP instructions actually increases TDP just because you've increased utilization of units that would have earlier gone unused. Basically, if there was a no-packed-fp16 and packed-fp16 variants of the exact same die, the packed-fp16 would logically have a higher TDP in fp16 workloads. How much more I couldn't say, but the same fp16 workload on a fp32 only chip would have half of those units not doing work. With how much powergating goes on today I wouldn't be surprised if this is a big part of that TDP figure.

Efficiency would still likely go up though, since you get 2x theoretical output for less than 2x an increase in TDP
I expect packed FP16 to increase the TDP somewhat, but your example only applies to FP16 workloads, not FP32 (which is what's relevant for gaming). If two cards are identical except for one executing 2 FP16 instructions per core and the other 1, of course the first would consume more power. The question is wether executing 2 FP16 instructions consumes more than 1 FP32 instruction in the same hardware. For some reason I think so, but I don't have any actual reasoning or data to base this on.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
I expect packed FP16 to increase the TDP somewhat, but your example only applies to FP16 workloads, not FP32 (which is what's relevant for gaming). If two cards are identical except for one executing 2 FP16 instructions per core and the other 1, of course the first would consume more power. The question is wether executing 2 FP16 instructions consumes more than 1 FP32 instruction in the same hardware. For some reason I think so, but I don't have any actual reasoning or data to base this on.

I assume that TDP incorporates worst case heat output. I further assume that the worst case heat output would be full utilization at the fastest TFLOPS rate, which is FP16. So I assume they revised the TDP upwards to some degree to account for that. I don't expect its a ton - but I would agree I think it probably does consume more than 1 FP32 instruction due to the extra fabric that would have to get used and the logic to pack the 2 FP16 operations together. The real question is what is the magnitude here - are we talking 250 milliwatts, 2.5 watts, or 25 watts kind of power envelopes?
 

Valantar

Golden Member
Aug 26, 2014
1,792
508
136
I assume that TDP incorporates worst case heat output. I further assume that the worst case heat output would be full utilization at the fastest TFLOPS rate, which is FP16. So I assume they revised the TDP upwards to some degree to account for that. I don't expect its a ton - but I would agree I think it probably does consume more than 1 FP32 instruction due to the extra fabric that would have to get used and the logic to pack the 2 FP16 operations together. The real question is what is the magnitude here - are we talking 250 milliwatts, 2.5 watts, or 25 watts kind of power envelopes?
My thinking exactly, especially with regards to TDP adjustments. I also (half-) expect consumer Vega to have FP16 packing disabled (/not enabled in drivers), and as such with a slightly lower TDP. I very much doubt AMD would dare launch a 300W TDP GPU in 2017 - that 3 is a scary number. 275 would be doable, for sure. The Fury (X) was/is 275W, after all.

Also (veering into speculation territory here) this makes me curious of the efficiency gains of cut-down parts. If this matches a Titan X (I'm not saying it does, but I'm guessing that's their goal) at 300W (TXP is 250W), that's only 20% more. If that's a level they can stick to, they could have a 1080 competitor at 215W or beat it cleanly at 225. I'm intrigued.