Gen8: Broadwell GPU Architecture

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
Broadwell GPU Architecture

From a high level overview, Broadwell’s GPU is a continuation of the Intel Gen7 architecture first pioneered in Ivy Bridge and further refined for Gen7.5 in Haswell. While there are some important underlying changes that we’ll get to in a moment, at a fundamental level this is still the same GPU architecture that we’ve seen from Intel for the last two generations, just with more features, more polish, and more optimizations than ever before.
GPUCompute.png


The ramifications of this is that not only is the total number of EUs increased by 20% from 20 to 24, but Intel has greatly increased the ratio of L1 cache and samplers relative to EUs. There is now 25% more sampling throughput per EU, with a total increase in sampler throughput (at identical clockspeeds) of 50%. By PC GPU standards increases in the ratio of samplers to EUs is very rare, with most designs decreasing that ratio over the years. The fact that Intel is increasing this ratio is a strong sign that Haswell’s balance may have been suboptimal for modern workloads, lacking enough sampler throughput to keep up with its shaders.

Moving on, along with the sub-slices front end and common slice are also receiving their own improvements. The common slice – responsible for housing the ROPs, rasterizer, and a port for the L3 cache – is receiving some microarchitecture improvements to further increase pixel and Z fill rates. Meanwhile the front end’s geometry units are also being beefed up to increase geometry throughput at that end.

Much like overall CPU performance, Intel isn’t talking about overall GPU performance at this time. Between the 20% increase in shading resources and 50% increase in sampling resources Broadwell’s GPU should deliver some strong performance gains, though it seems unlikely that it will be on the order of a full generational gain (e.g. catching up to Haswell GT3). What Intel is doing however is reiterating the benefits of their 14nm process in this case, noting that because 14nm significantly reduces GPU power consumption it will allow for more thermal headroom, which should further improve both burst and sustained GPU performance in TDP-limited scenarios relative to Haswell.

GPUMedia.png



So what I wonder is that Gen8 was claimed to be a very substantial improvement, but Ryan Smith says there are basically no changes to the Gen7 architecture (mainly re-balancing of slices). What do you think about Gen8 and what performance and power improvements do you expect?
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I don't think Ryan Smith's comments - which are really just repeating the slides - are saying basically no changes to Gen 7 uarch. Changes in geometry, Z, and pixel fill can be considered significant. The problem is that "substantial improvement" isn't exactly quantifiable, and a lot of the claims were along these lines. Extremetech's specific estimate of a 40% improvement seems possible under some scenarios though.

Another quantifiable claim was that the GPU leap from Haswell to Broadwell would exceed that from IB to Haswell: http://www.reddit.com/r/IAmA/comments/15iaet/iama_cpu_architect_and_designer_at_intel_ama/c7mpg8v

If we're looking at high end to high end, Iris Pro 5200 routinely (if not uniformly) performed 2x or better than HD 4000 (http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/9) and I can say with pretty high confidence that the highest end Broadwell GPU will not have such a leap over Iris Pro 5200.
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
Comparing GT2 to GT3 doesn't seem fair to me and is probably not what he meant. He probably meant a bigger improvement than HD4000 to HD4600.

And there's also power. Nvidia claims Maxwell is 35% faster per core, but a 2X improvement in efficiency. We don't know if Gen8 will achieve something similar, which could be very substantial combined with the 2X improvement from the 14nm process.
 

firewolfsm

Golden Member
Oct 16, 2005
1,848
29
91
Unless there's a three slice configuration for GT4. That would be 72 EUs, x1.2 considering architecture improvements would more than double GT3 performance.
 

jpiniero

Lifer
Oct 1, 2010
16,492
6,983
136
You only have to look at the Iris Pro vs Kaveri results to see that they really needed to pump up the fill rate.
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
So Intel doesn't seem to have changed much with the shaders, but they do say that they significantly changed the architectural things they mention in the first slide.
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
So Intel doesn't seem to have changed much with the shaders, but they do say that they significantly changed the architectural things they mention in the first slide.

If you look at the review of Iris pro you immediately notice that compute is decent (between nvidia and AMD), ROP and geometry performance is strong and that texturing and AA is extremely weak.
 

MisterLilBig

Senior member
Apr 15, 2014
291
0
76
So the next GT3 Iris Pro will only have 48 EU's!? Just "20% more compute".

If there is no "GT4", I'm not happy with this.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
So the next GT3 Iris Pro will only have 48 EU's!? Just "20% more compute".

If there is no "GT4", I'm not happy with this.

The bottleneck for Intel IGps is not the compute power. They are actually stronger than others there. Note the 50% sampler increase, thats where the issue is.
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
So the next GT3 Iris Pro will only have 48 EU's!? Just "20% more compute".

If there is no "GT4", I'm not happy with this.

There should be a Skylake Gen9 GT4 72EU SKU at around the same time somewhere in 2015.

BTW, we already know for months that it will be 20% EUs for GT.
 

MisterLilBig

Senior member
Apr 15, 2014
291
0
76
But, how will this actually affect game performance?

Iris Pro GT3e was usually just 80% to 90% more performent than HD4600. And It has twice the EU's and 128MB of L4 cache, I expected more.

Can this really be called Gen8? Slides didn't mention it.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
But, how will this actually affect game performance?

Iris Pro GT3e was usually just 80% to 90% more performent than HD4600. And It has twice the EU's and 128MB of L4 cache, I expected more.

Can this really be called Gen8? Slides didn't mention it.

The sampler output is the bottleneck for gaming.
 

MisterLilBig

Senior member
Apr 15, 2014
291
0
76
So expecting the mobile HQ Broadwell Iris Pro 5200 GT3e running Battlefield 3 benchmark at 1680 x 1050 High Quality at around 42.3 FPS, if its 70% higher performance than mobile Haswell's HQ GT3e? Or 49.8 FPS if its a 100% improvement?

By the slides, this is not Gen8.
 

Enigmoid

Platinum Member
Sep 27, 2012
2,907
31
91
The sampler output is the bottleneck for gaming.

Most definitely. What you will also notice is that the Haswell igp scales extremely poorly with regards to EUs.

For instance, HD 4600 (4770k) runs 20 EU @ 1250 mhz. Bay trail runs 4 EU @ 667 mhz.
Theoretically HD 4600 should be 9.4x faster assuming perfect scaling (not entirely unreasonable consider the embarassingly parallel graphics workload and that neither is significantly memory BW bound).

However, in real life benchmarks, BT is never that much slower, generally about 3-6 times slower. For example T-rex offscreen is about 15 fps on BT and ~80 on the 4770k (533% faster).

Something is messing up scaling, hopefully Broadwell and Skylake address this.