Gen8: Broadwell GPU Architecture

witeken · Aug 11, 2014

Broadwell GPU Architecture

From a high level overview, Broadwell’s GPU is a continuation of the Intel Gen7 architecture first pioneered in Ivy Bridge and further refined for Gen7.5 in Haswell. While there are some important underlying changes that we’ll get to in a moment, at a fundamental level this is still the same GPU architecture that we’ve seen from Intel for the last two generations, just with more features, more polish, and more optimizations than ever before.

The ramifications of this is that not only is the total number of EUs increased by 20% from 20 to 24, but Intel has greatly increased the ratio of L1 cache and samplers relative to EUs. There is now 25% more sampling throughput per EU, with a total increase in sampler throughput (at identical clockspeeds) of 50%. By PC GPU standards increases in the ratio of samplers to EUs is very rare, with most designs decreasing that ratio over the years. The fact that Intel is increasing this ratio is a strong sign that Haswell’s balance may have been suboptimal for modern workloads, lacking enough sampler throughput to keep up with its shaders.

Moving on, along with the sub-slices front end and common slice are also receiving their own improvements. The common slice – responsible for housing the ROPs, rasterizer, and a port for the L3 cache – is receiving some microarchitecture improvements to further increase pixel and Z fill rates. Meanwhile the front end’s geometry units are also being beefed up to increase geometry throughput at that end.

Much like overall CPU performance, Intel isn’t talking about overall GPU performance at this time. Between the 20% increase in shading resources and 50% increase in sampling resources Broadwell’s GPU should deliver some strong performance gains, though it seems unlikely that it will be on the order of a full generational gain (e.g. catching up to Haswell GT3). What Intel is doing however is reiterating the benefits of their 14nm process in this case, noting that because 14nm significantly reduces GPU power consumption it will allow for more thermal headroom, which should further improve both burst and sustained GPU performance in TDP-limited scenarios relative to Haswell.

So what I wonder is that Gen8 was claimed to be a very substantial improvement, but Ryan Smith says there are basically no changes to the Gen7 architecture (mainly re-balancing of slices). What do you think about Gen8 and what performance and power improvements do you expect?

Exophase · Aug 11, 2014

I don't think Ryan Smith's comments - which are really just repeating the slides - are saying basically no changes to Gen 7 uarch. Changes in geometry, Z, and pixel fill can be considered significant. The problem is that "substantial improvement" isn't exactly quantifiable, and a lot of the claims were along these lines. Extremetech's specific estimate of a 40% improvement seems possible under some scenarios though.

Another quantifiable claim was that the GPU leap from Haswell to Broadwell would exceed that from IB to Haswell: http://www.reddit.com/r/IAmA/comments/15iaet/iama_cpu_architect_and_designer_at_intel_ama/c7mpg8v

If we're looking at high end to high end, Iris Pro 5200 routinely (if not uniformly) performed 2x or better than HD 4000 (http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/9) and I can say with pretty high confidence that the highest end Broadwell GPU will not have such a leap over Iris Pro 5200.

witeken · Aug 11, 2014

Comparing GT2 to GT3 doesn't seem fair to me and is probably not what he meant. He probably meant a bigger improvement than HD4000 to HD4600.

And there's also power. Nvidia claims Maxwell is 35% faster per core, but a 2X improvement in efficiency. We don't know if Gen8 will achieve something similar, which could be very substantial combined with the 2X improvement from the 14nm process.

firewolfsm · Aug 11, 2014

Unless there's a three slice configuration for GT4. That would be 72 EUs, x1.2 considering architecture improvements would more than double GT3 performance.

jpiniero · Aug 11, 2014

You only have to look at the Iris Pro vs Kaveri results to see that they really needed to pump up the fill rate.

witeken · Aug 12, 2014

So Intel doesn't seem to have changed much with the shaders, but they do say that they significantly changed the architectural things they mention in the first slide.

Enigmoid · Aug 12, 2014

witeken said:
So Intel doesn't seem to have changed much with the shaders, but they do say that they significantly changed the architectural things they mention in the first slide.

If you look at the review of Iris pro you immediately notice that compute is decent (between nvidia and AMD), ROP and geometry performance is strong and that texturing and AA is extremely weak.

MisterLilBig · Aug 14, 2014

So the next GT3 Iris Pro will only have 48 EU's!? Just "20% more compute".

If there is no "GT4", I'm not happy with this.

ShintaiDK · Aug 14, 2014

MisterLilBig said:
So the next GT3 Iris Pro will only have 48 EU's!? Just "20% more compute".

If there is no "GT4", I'm not happy with this.

The bottleneck for Intel IGps is not the compute power. They are actually stronger than others there. Note the 50% sampler increase, thats where the issue is.

witeken · Aug 14, 2014

MisterLilBig said:
So the next GT3 Iris Pro will only have 48 EU's!? Just "20% more compute".

If there is no "GT4", I'm not happy with this.

There should be a Skylake Gen9 GT4 72EU SKU at around the same time somewhere in 2015.

BTW, we already know for months that it will be 20% EUs for GT.

MisterLilBig · Aug 14, 2014

But, how will this actually affect game performance?

Iris Pro GT3e was usually just 80% to 90% more performent than HD4600. And It has twice the EU's and 128MB of L4 cache, I expected more.

Can this really be called Gen8? Slides didn't mention it.

ShintaiDK · Aug 14, 2014

MisterLilBig said:
But, how will this actually affect game performance?

Iris Pro GT3e was usually just 80% to 90% more performent than HD4600. And It has twice the EU's and 128MB of L4 cache, I expected more.

Can this really be called Gen8? Slides didn't mention it.

The sampler output is the bottleneck for gaming.

MisterLilBig · Aug 14, 2014

So expecting the mobile HQ Broadwell Iris Pro 5200 GT3e running Battlefield 3 benchmark at 1680 x 1050 High Quality at around 42.3 FPS, if its 70% higher performance than mobile Haswell's HQ GT3e? Or 49.8 FPS if its a 100% improvement?

By the slides, this is not Gen8.

Enigmoid · Aug 14, 2014

ShintaiDK said:
The sampler output is the bottleneck for gaming.

Most definitely. What you will also notice is that the Haswell igp scales extremely poorly with regards to EUs.

For instance, HD 4600 (4770k) runs 20 EU @ 1250 mhz. Bay trail runs 4 EU @ 667 mhz.
Theoretically HD 4600 should be 9.4x faster assuming perfect scaling (not entirely unreasonable consider the embarassingly parallel graphics workload and that neither is significantly memory BW bound).

However, in real life benchmarks, BT is never that much slower, generally about 3-6 times slower. For example T-rex offscreen is about 15 fps on BT and ~80 on the 4770k (533% faster).

Something is messing up scaling, hopefully Broadwell and Skylake address this.

witeken · Aug 15, 2014

MisterLilBig said:
By the slides, this is not Gen8.

http://intelstudios.edgesuite.net/140811_intel/event.html

1:16:50

It is.

Search

Gen8: Broadwell GPU Architecture

witeken

Diamond Member

Exophase

Diamond Member

witeken

Diamond Member

firewolfsm

Golden Member

jpiniero

Lifer

witeken

Diamond Member

Enigmoid

Platinum Member

MisterLilBig

Senior member

ShintaiDK

Lifer

witeken

Diamond Member

MisterLilBig

Senior member

ShintaiDK

Lifer

MisterLilBig

Senior member

Enigmoid

Platinum Member

witeken

Diamond Member

TRENDING THREADS