Radeon Vega Architecture Preview Thread

DisEnchantment · Mar 31, 2017

http://wccftech.com/vega-teaser-slides-leak-nda/

Sell Your Titan X & 1080 TI

Dunno if for real...

Puffnstuff · Mar 31, 2017

I'm going to have to wait for it to appear in the wild in the hands of normal people first before I pass judgement on this technology. They have a bad habit of stretching the truth over there at AMD so until we can see how it stacks up in the real world it's just hype.

EXCellR8 · Mar 31, 2017

likewise... seeing is believing

crisium · Mar 31, 2017

It's April Fools.

sandorski · Mar 31, 2017

crisium said:
It's April Fools.

Very possibly. Kinda cruel if it is, since it seems plausible.

Glo. · Apr 3, 2017

I will repost this from Sebbbi, so that everyone can read it:
https://forum.beyond3d.com/posts/1973875/

Just wanted to clarify that I meant AMD GCN2 (consoles) vs Nvidia's latest (Maxwell/Pascal). AMD PC GPUs have also improved since GCN2.

Improvements for general performance:
- GCN3 introduced delta color compression. Including ability to sample/load compressed textures without decompress step.
- GCN3 improved geometry tessellation performance
- GCN4 improved geometry performance in general (including fast strips, primitive discard, etc).
- GCN4 improved delta color compression.
- GCN4 added instruction prefetch (reduces pipeline latency, again helps with geom bottleneck).
- GCN4 improved async compute scheduling (GPU side)

GCN5 (Vega) adds these general performance improvements:
- L2 cache includes L2 ROP cache (L1 ROP caches under L2). Don't need to flush caches between pixel shader passes.
- Tiled rasterizer. Reduces overdraw, bandwidth and makes ROPs more efficient in general.
- Improved geometry pipeline (including proper load balancing, up to 2x higher peak throughput)
- General purpose memory paging system

(I didn't list features that don't bring performance improvements without programmer intervention)

All of these improvements mean that GCN5 should run general purpose pixel/vertex shader code much better than GCN2. GCN5 has most of the same tricks that are seen in modern Nvidia GPUs. There are nice compute improvements as well, but they need special programmer support (DPP, SDWA, FP16). We will see the real impact of these improvements when DX12 SM 6.0 becomes available. Doom is already using these features with Vulkan, resulting in nice gains.

french toast · Apr 3, 2017

What does SM 6 entail? I heard it was delayed but I know nothing of its content, is it a big upgrade? Any links someone could provide would be great.

Glo. · Apr 3, 2017

https://msdn.microsoft.com/en-us/library/windows/desktop/mt733232(v=vs.85).aspx

TerionX6 · Apr 7, 2017

Disregarding unknowns and assuming clockspeed scales linearly with performance we can extrapolate a very very basic performance level of Big Vega.

Fury X @ 1050Mhz = 75% relative performance of a GTX1080 at 4K res, as per TPU's GTX1080 review: https://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_1080/26.html
Vega is known capable of at least (or at most) 1525Mhz
1525/1050=1.452

.75 * 1.452 = 1.089, or about 9% faster than a stock 1080 assuming all other things are equal.
So, throwing out everything but the redesigned-for-clockspeed NCUs, at a bare minimum Vega will be on average faster than a 1080.
If we assume that Vega will offer a barebones-lowballed ~10% IPC over Fiji due to the L2$-tied ROPs, better compression, tiled rasterizer:
1.089 * 1.1 = 1.197 or ~20% better performance than a 1080.

I would think that in general this is the lowest level of performance we will see from Vega unless/until optimized FP16 support and the new shader intrinsics are utilized.

If we extrapolate Fury X 4K performance with respect to the TPU 1080Ti Founder's Edition review, assuming a general 10% IPC improvement over Fiji, then Vega would come out to ~92.5% of a stock 1080Ti's performance.
Either that or my math is woefully incorrect. Feel free to fix it up.

For Fun:
Assume Vega can hit a maximum clock of 1680Mhz, 10% overclock
Assume Vega's new features give ~20% improvement
.58 (FuryX relative perf to a 1080ti at 4K) * 1.68 * 1.2 = 1.169
In other words, I imagine a best case scenario Vega stomping on a stock 1080Ti by 17%

If my dreams turn reality, man it sure is a good thing I don't have a wife or kids

Samwell · Apr 7, 2017

TerionX6 said:
For Fun:
Assume Vega can hit a maximum clock of 1680Mhz, 10% overclock
Assume Vega's new features give ~20% improvement
.58 (FuryX relative perf to a 1080ti at 4K) * 1.68 * 1.2 = 1.169
In other words, I imagine a best case scenario Vega stomping on a stock 1080Ti by 17%

Small fault here, you forgot 1680/1050 this time. So it's 0.58 x 1,6 x 1,2 = 1,11. Neck 2 Neck with Oced 1080Tis.

But actually your calculations are the same as what i expect. Worst case 10% slower than 1080Ti, Best case fight against the Custom 1080Ti cards. I think 1080Ti speed should be possible, but AMD will need 300W instead of Nvidias 250W for the same speed. But in highend these 50W don't matter anyway.

zinfamous · Apr 7, 2017

Samwell said:
Small fault here, you forgot 1680/1050 this time. So it's 0.58 x 1,6 x 1,2 = 1,11. Neck 2 Neck with Oced 1080Tis.

But actually your calculations are the same as what i expect. Worst case 10% slower than 1080Ti, Best case fight against the Custom 1080Ti cards. I think 1080Ti speed should be possible, but AMD will need 300W instead of Nvidias 250W for the same speed. But in highend these 50W don't matter anyway.

does that take into account HBM2 efficiency gains in Vega over HBM efficiency in Figi (though relatively minor, right?), with respect to comparative GDDR5x in Pascal? Also, it's hard to say at this point what kind of general power improvements Vega will bring since so much of it is new to AMD and more in line with nVidia's improvements over the last couple of years, no?

Krteq · Apr 7, 2017

New tile-based rasterizer can also save a great amount of power

Samwell · Apr 7, 2017

zinfamous said:
does that take into account HBM2 efficiency gains in Vega over HBM efficiency in Figi (though relatively minor, right?), with respect to comparative GDDR5x in Pascal? Also, it's hard to say at this point what kind of general power improvements Vega will bring since so much of it is new to AMD and more in line with nVidia's improvements over the last couple of years, no?

Yes, it's just thinking of what is realistic. 780Ti to 980Ti was a 55% Perf/W Improvement on the same Note:https://www.techpowerup.com/reviews/ASUS/GTX_980_Ti_Matrix/24.html
For AMD to equal 1080Ti they need a 65% Perf/W improvement for Vega over P10 :https://www.techpowerup.com/reviews/Gigabyte/GTX_1080_Ti_Xtreme_Gaming/31.html

I don't think that Vega will have bigger improvements in Perf/W than Maxwell, because 980Ti was a pure gaming chip compared to the mixed gpu 780Ti. With P10 and Vega it's the other way around, going from a pure gaming gpu to a mixed Gaming/HPC chip. Small Vega therefore might beat Pascal in efficiency.

Krteq · Apr 7, 2017

Samwell said:
Yes, it's just thinking of what is realistic. 780Ti to 980Ti was a 55% Perf/W Improvement on the same Note:https://www.techpowerup.com/reviews/ASUS/GTX_980_Ti_Matrix/24.html

Yes, most of this was possible due to TBR I mentioned in last post.

Samwell said:
For AMD to equal 1080Ti they need a 65% Perf/W improvement for Vega over P10 :https://www.techpowerup.com/reviews/Gigabyte/GTX_1080_Ti_Xtreme_Gaming/31.html

I don't think that Vega will have bigger improvements in Perf/W than Maxwell, because 980Ti was a pure gaming chip compared to the mixed gpu 780Ti.

Well, Vega will have a NCU with better efficiency, new cache subsystem and TBR which is tied to it, HBMs can save some energy and we have a new manufacturing process... so yes, I think they can achieve similar or better results like nV with Kepler -> Maxwell transition.

Samwell said:
With P10 and Vega it's the other way around, going from a pure gaming gpu to a mixed Gaming/HPC chip.

Hmm, both Polaris 10 and Vega 10 have a same DPFP rate - 1/16 (FP64/FP32). What exactly is different for Vega that you are calling it a "mixed Gaming/HPC chip"?

Samwell · Apr 7, 2017

Krteq said:
Yes, most of this was possible due to TBR I mentioned in last post.

Well, Vega will have a NCU with better efficiency, new cache subsystem and TBR which is tied to it, HBMs can save some energy and we have a new manufacturing process... so yes, I think they can achieve similar or better results like nV with Kepler -> Maxwell transition.

There is no new manufacturing process between Polaris and Vega, both are 14LPP. Talks of P10 beeing 14LPE are just wrong as far as i know. The step from Kepler for Maxwell had quite a few similarities, TBR and new Shaders for Maxwell brought the 55% Efficiency gain. Vega will have additionaly HBM, which might yield to even higher eifficiency, but other stuff might cost Perf/W.

Hmm, both Polaris 10 and Vega 10 have a same DPFP rate - 1/16 (FP64/FP32). What exactly is different for Vega that you are calling it a "mixed Gaming/HPC chip"?

DP rate doesn't matter so much nowadays in HPC. Most of the market growth is coming from AI. Vega has 4x Int8 rate, Infinity Fabric and other hpc stuff of which i have no idea. There is a reason, why they teasered Vega first in radeon instinct. AMD wants to gain share there and as i read hpc people expect a lot from vega. I'm pretty sure there is a lot of stuff, which they didn't present yet.

Puffnstuff · Apr 7, 2017

DisEnchantment said:
http://wccftech.com/vega-teaser-slides-leak-nda/

Dunno if for real...

Now they have to contend with a Titan Xp.

Krteq · Apr 8, 2017

Puffnstuff said:
Now they have to contend with a Titan Xp.

Why? Titan Xp is in quite different price segment.

Puffnstuff · Apr 8, 2017

They've already implied that their Vega product will have the highest performance and the Titan Xp sets the prerelease performance high bar.

Krteq · Apr 8, 2017

Samwell said:
Vega has 4x Int8 rate, Infinity Fabric and other hpc stuff of which i have no idea.

Where is this "4x INT8 rate" info from? Vega can't do INT8 on NCU, only FP16/FP32/FP64 in 4/2/1 rate and FP16 ops can be "packed" and run on a FP32 SIMD unit with minimal transistor cost.

Anyway, according to AMD materials Infinity Fabric is just a 256-bit bi-directional crossbar, so it can't consumes so much transistors as well.

Samwell · Apr 8, 2017

Krteq said:
Where is this "4x INT8 rate" info from? Vega can't do INT8 on NCU, only FP16/FP32/FP64 in 4/2/1 rate and FP16 ops can be "packed" and run on a FP32 SIMD unit with minimal transistor cost.

Anyway, according to AMD materials Infinity Fabric is just a 256-bit bi-directional crossbar, so it can't consumes so much transistors as well.

Directly from AMD

4:2:1 is right, but it's 4 Int8: 2 FP16: 1 FP32. DP should be weak, probably like 1/16 rate on Fury. But there is other hpc stuff, which should be in there. We will see in a few months.

Krteq · Apr 9, 2017

Oh, missed that, thx for info.

tamz_msc · Apr 10, 2017

I've been wondering about the details of the HBCC - is it some custom IP(derived from ARM perhaps?) that serves a similar purpose as the controllers do in SSDs?

Radeon Vega Architecture Preview Thread

Golden Member

Lifer

Diamond Member

Platinum Member

No Lifer

Diamond Member

Senior member

Diamond Member

Junior Member

Senior member

No Lifer

Golden Member

Senior member

Golden Member

Senior member

Lifer

Golden Member

Lifer

Golden Member

Senior member

Golden Member

Diamond Member