Radeon Vega Architecture Preview Thread

Page 10 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Puffnstuff

Lifer
Mar 9, 2005
16,030
4,798
136
I'm going to have to wait for it to appear in the wild in the hands of normal people first before I pass judgement on this technology. They have a bad habit of stretching the truth over there at AMD so until we can see how it stacks up in the real world it's just hype.
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,556
136
I will repost this from Sebbbi, so that everyone can read it:
https://forum.beyond3d.com/posts/1973875/

Just wanted to clarify that I meant AMD GCN2 (consoles) vs Nvidia's latest (Maxwell/Pascal). AMD PC GPUs have also improved since GCN2.

Improvements for general performance:
- GCN3 introduced delta color compression. Including ability to sample/load compressed textures without decompress step.
- GCN3 improved geometry tessellation performance
- GCN4 improved geometry performance in general (including fast strips, primitive discard, etc).
- GCN4 improved delta color compression.
- GCN4 added instruction prefetch (reduces pipeline latency, again helps with geom bottleneck).
- GCN4 improved async compute scheduling (GPU side)

GCN5 (Vega) adds these general performance improvements:
- L2 cache includes L2 ROP cache (L1 ROP caches under L2). Don't need to flush caches between pixel shader passes.
- Tiled rasterizer. Reduces overdraw, bandwidth and makes ROPs more efficient in general.
- Improved geometry pipeline (including proper load balancing, up to 2x higher peak throughput)
- General purpose memory paging system

(I didn't list features that don't bring performance improvements without programmer intervention)

All of these improvements mean that GCN5 should run general purpose pixel/vertex shader code much better than GCN2. GCN5 has most of the same tricks that are seen in modern Nvidia GPUs. There are nice compute improvements as well, but they need special programmer support (DPP, SDWA, FP16). We will see the real impact of these improvements when DX12 SM 6.0 becomes available. Doom is already using these features with Vulkan, resulting in nice gains.
 

french toast

Senior member
Feb 22, 2017
988
825
136
What does SM 6 entail? I heard it was delayed but I know nothing of its content, is it a big upgrade? Any links someone could provide would be great.
 

TerionX6

Junior Member
Jun 29, 2015
14
20
46
Disregarding unknowns and assuming clockspeed scales linearly with performance we can extrapolate a very very basic performance level of Big Vega.

Fury X @ 1050Mhz = 75% relative performance of a GTX1080 at 4K res, as per TPU's GTX1080 review: https://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_1080/26.html
Vega is known capable of at least (or at most) 1525Mhz
1525/1050=1.452

.75 * 1.452 = 1.089, or about 9% faster than a stock 1080 assuming all other things are equal.
So, throwing out everything but the redesigned-for-clockspeed NCUs, at a bare minimum Vega will be on average faster than a 1080.
If we assume that Vega will offer a barebones-lowballed ~10% IPC over Fiji due to the L2$-tied ROPs, better compression, tiled rasterizer:
1.089 * 1.1 = 1.197 or ~20% better performance than a 1080.

I would think that in general this is the lowest level of performance we will see from Vega unless/until optimized FP16 support and the new shader intrinsics are utilized.

If we extrapolate Fury X 4K performance with respect to the TPU 1080Ti Founder's Edition review, assuming a general 10% IPC improvement over Fiji, then Vega would come out to ~92.5% of a stock 1080Ti's performance.
Either that or my math is woefully incorrect. Feel free to fix it up.

For Fun:
Assume Vega can hit a maximum clock of 1680Mhz, 10% overclock
Assume Vega's new features give ~20% improvement
.58 (FuryX relative perf to a 1080ti at 4K) * 1.68 * 1.2 = 1.169
In other words, I imagine a best case scenario Vega stomping on a stock 1080Ti by 17%

If my dreams turn reality, man it sure is a good thing I don't have a wife or kids :D
 

Samwell

Senior member
May 10, 2015
225
47
101
For Fun:
Assume Vega can hit a maximum clock of 1680Mhz, 10% overclock
Assume Vega's new features give ~20% improvement
.58 (FuryX relative perf to a 1080ti at 4K) * 1.68 * 1.2 = 1.169
In other words, I imagine a best case scenario Vega stomping on a stock 1080Ti by 17%

Small fault here, you forgot 1680/1050 this time. So it's 0.58 x 1,6 x 1,2 = 1,11. Neck 2 Neck with Oced 1080Tis.

But actually your calculations are the same as what i expect. Worst case 10% slower than 1080Ti, Best case fight against the Custom 1080Ti cards. I think 1080Ti speed should be possible, but AMD will need 300W instead of Nvidias 250W for the same speed. But in highend these 50W don't matter anyway.
 

zinfamous

No Lifer
Jul 12, 2006
110,592
29,221
146
Small fault here, you forgot 1680/1050 this time. So it's 0.58 x 1,6 x 1,2 = 1,11. Neck 2 Neck with Oced 1080Tis.

But actually your calculations are the same as what i expect. Worst case 10% slower than 1080Ti, Best case fight against the Custom 1080Ti cards. I think 1080Ti speed should be possible, but AMD will need 300W instead of Nvidias 250W for the same speed. But in highend these 50W don't matter anyway.

does that take into account HBM2 efficiency gains in Vega over HBM efficiency in Figi (though relatively minor, right?), with respect to comparative GDDR5x in Pascal? Also, it's hard to say at this point what kind of general power improvements Vega will bring since so much of it is new to AMD and more in line with nVidia's improvements over the last couple of years, no?
 

Samwell

Senior member
May 10, 2015
225
47
101
does that take into account HBM2 efficiency gains in Vega over HBM efficiency in Figi (though relatively minor, right?), with respect to comparative GDDR5x in Pascal? Also, it's hard to say at this point what kind of general power improvements Vega will bring since so much of it is new to AMD and more in line with nVidia's improvements over the last couple of years, no?

Yes, it's just thinking of what is realistic. 780Ti to 980Ti was a 55% Perf/W Improvement on the same Note:https://www.techpowerup.com/reviews/ASUS/GTX_980_Ti_Matrix/24.html
For AMD to equal 1080Ti they need a 65% Perf/W improvement for Vega over P10 :https://www.techpowerup.com/reviews/Gigabyte/GTX_1080_Ti_Xtreme_Gaming/31.html

I don't think that Vega will have bigger improvements in Perf/W than Maxwell, because 980Ti was a pure gaming chip compared to the mixed gpu 780Ti. With P10 and Vega it's the other way around, going from a pure gaming gpu to a mixed Gaming/HPC chip. Small Vega therefore might beat Pascal in efficiency.
 

Krteq

Senior member
May 22, 2015
991
671
136
Yes, it's just thinking of what is realistic. 780Ti to 980Ti was a 55% Perf/W Improvement on the same Note:https://www.techpowerup.com/reviews/ASUS/GTX_980_Ti_Matrix/24.html
Yes, most of this was possible due to TBR I mentioned in last post.

For AMD to equal 1080Ti they need a 65% Perf/W improvement for Vega over P10 :https://www.techpowerup.com/reviews/Gigabyte/GTX_1080_Ti_Xtreme_Gaming/31.html

I don't think that Vega will have bigger improvements in Perf/W than Maxwell, because 980Ti was a pure gaming chip compared to the mixed gpu 780Ti.
Well, Vega will have a NCU with better efficiency, new cache subsystem and TBR which is tied to it, HBMs can save some energy and we have a new manufacturing process... so yes, I think they can achieve similar or better results like nV with Kepler -> Maxwell transition.

With P10 and Vega it's the other way around, going from a pure gaming gpu to a mixed Gaming/HPC chip.
Hmm, both Polaris 10 and Vega 10 have a same DPFP rate - 1/16 (FP64/FP32). What exactly is different for Vega that you are calling it a "mixed Gaming/HPC chip"?
 
  • Like
Reactions: w3rd

Samwell

Senior member
May 10, 2015
225
47
101
Yes, most of this was possible due to TBR I mentioned in last post.

Well, Vega will have a NCU with better efficiency, new cache subsystem and TBR which is tied to it, HBMs can save some energy and we have a new manufacturing process... so yes, I think they can achieve similar or better results like nV with Kepler -> Maxwell transition.

There is no new manufacturing process between Polaris and Vega, both are 14LPP. Talks of P10 beeing 14LPE are just wrong as far as i know. The step from Kepler for Maxwell had quite a few similarities, TBR and new Shaders for Maxwell brought the 55% Efficiency gain. Vega will have additionaly HBM, which might yield to even higher eifficiency, but other stuff might cost Perf/W.

Hmm, both Polaris 10 and Vega 10 have a same DPFP rate - 1/16 (FP64/FP32). What exactly is different for Vega that you are calling it a "mixed Gaming/HPC chip"?

DP rate doesn't matter so much nowadays in HPC. Most of the market growth is coming from AI. Vega has 4x Int8 rate, Infinity Fabric and other hpc stuff of which i have no idea. There is a reason, why they teasered Vega first in radeon instinct. AMD wants to gain share there and as i read hpc people expect a lot from vega. I'm pretty sure there is a lot of stuff, which they didn't present yet.
 
  • Like
Reactions: Grubbernaught

Puffnstuff

Lifer
Mar 9, 2005
16,030
4,798
136
They've already implied that their Vega product will have the highest performance and the Titan Xp sets the prerelease performance high bar.
 

Krteq

Senior member
May 22, 2015
991
671
136
Vega has 4x Int8 rate, Infinity Fabric and other hpc stuff of which i have no idea.
Where is this "4x INT8 rate" info from? Vega can't do INT8 on NCU, only FP16/FP32/FP64 in 4/2/1 rate and FP16 ops can be "packed" and run on a FP32 SIMD unit with minimal transistor cost.

Anyway, according to AMD materials Infinity Fabric is just a 256-bit bi-directional crossbar, so it can't consumes so much transistors as well.
 

Samwell

Senior member
May 10, 2015
225
47
101
Where is this "4x INT8 rate" info from? Vega can't do INT8 on NCU, only FP16/FP32/FP64 in 4/2/1 rate and FP16 ops can be "packed" and run on a FP32 SIMD unit with minimal transistor cost.

Anyway, according to AMD materials Infinity Fabric is just a 256-bit bi-directional crossbar, so it can't consumes so much transistors as well.

Directly from AMD:)

Vega%20Final%20Presentation-27.png


4:2:1 is right, but it's 4 Int8: 2 FP16: 1 FP32. DP should be weak, probably like 1/16 rate on Fury. But there is other hpc stuff, which should be in there. We will see in a few months.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,821
3,641
136
I've been wondering about the details of the HBCC - is it some custom IP(derived from ARM perhaps?) that serves a similar purpose as the controllers do in SSDs?