Skylake SKU

jpiniero · Nov 22, 2014

I don't think you can say if the L3 is going to be cut with Broadwell and/or Skylake GT3/4 parts.

rootheday3 · Nov 22, 2014

Enigmoid is correct - a gt3 GPU is less than 2x of a gt2. The reason is as follows:
Gt = non-slice (media stuff + 3d geometry pipeline + misc) + 1-N slices.

The slices contain slice common + # subslices (2-3 in current designs) + some amount of cache allocation.

Slice common= the pixel backends/z handling, scoreboards
Subslice = EUs + sampler + some misc stuff

In a gt3, there is one more slice (including slice common+ subslices) vs a gt2, but the other non-slice stuff (media, 3d geometry, ...) is not duplicated. Caveat: to help media workloads scale, gt3 HSW does duplicate some of the media assets.

People keep talking about adding more EUs or making the EUs faster. It should be pretty clear that HSW is already compute heavy with a 10:1 ratios of eu:sampler per subslice.

bdw moves to 8:1 ratio so as to not be sampler bound so often. Sublices per slice goes from 2 to 3 thus total EUs per slice from 20 to 24 (20% more) but samplers from 2 to 3 (50%).

It is true that going wider but slower is good for power efficiency but it is also expensive on die area. Since the slice/ Subslice logic (mostly EUs + samplers + added cache) is the biggest part of the gt die area, it would make sense for Intel to go after architectural things that get more perf out if the existing EUs by addressing bottlenecks elsewhere that cause the EUs (or samplers) to stall.

In some workloads, HSW is competitive in perf/clk to an AMD or NVidia design if you normalize by theoretical FLOPs (1 eu = max 16 flops/clk) and clock speed. Other workloads not so much. The latter include MSAA, certain non promoted z cases, depth resolves, and, as an integrated part overall memory bandwidth/frame that hits the memory controller. Look for Intel to work on these latter areas vs raw EU count increase alone.

witeken · Nov 22, 2014

rootheday3 said:
Enigmoid is correct - a gt3 GPU is less than 2x of a gt2. The reason is as follows:
Gt = non-slice (media stuff + 3d geometry pipeline + misc) + 1-N slices.

It's quite disappointing, honestly, that for a company that talks so much about the exponential Moore's law, the equation is plus N-1 slices instead of plus N*1 slices; GT5 at 10nm would only have 33% more shaders despite a doubling in density. I guess Intel only wants to go after the low-end of the dGPU market, instead of AMD's more aggressive approach.

Other workloads not so much. The latter include MSAA, certain non promoted z cases, depth resolves, and, as an integrated part overall memory bandwidth/frame that hits the memory controller. Look for Intel to work on these latter areas vs raw EU count increase alone.

Yes, that was clear from the presentations at IDF. Do you think Intel will seek higher clock speeds for the GPU to mitigate die area costs?

III-V · Nov 22, 2014

I think you mean N * 2

Intel already runs at pretty high clocks. I'd expect more of the same in that regard. But really, they are very good in the compute department. They need to focus on other things in the graphics pipeline over increasing EU count per slice, which is what they're doing.

witeken · Nov 22, 2014

III-V said:
I think you mean N * 2

No no, I failed a lot there with that N*1, but exponential means in this case 2^(N-2) slices, so GT2 would have 1 slice, GT3 has 2, GT4 has 4, etc. Every node you add 2X the slices so you can increment the GT number by 1, just like AMD and Nvidia are doing with their SMMs and CUs.

mikk · Nov 22, 2014

I doubt Skylake has the same CPU and GPU size. Any size calculation based on Broadwell is flawed.

AtenRa · Nov 22, 2014

Enigmoid said:
Great job!

Don't forget that not all of the slice is replicated.

It looks like die sizes will be more around
117 mm^2 for 2 + 3
134 mm^2 for 4 + 3
172 mm^2 for 4 + 4

Yea it seams you are right, ill have a look at them again tomorrow and make new drawings.

IntelUser2000 · Jan 14, 2015

AtenRa said:
Yea it seams you are right, ill have a look at them again tomorrow and make new drawings.

The way you get there can be better, but I think your numbers are probably right. GT3e dies are bigger than GT3 dies because GT3e dies need tags for the eDRAM, since it uses it as a cache. Otherwise, they would be pretty similar. On Haswell, it can be about 20mm2.

240mm2 at 28nm is way cheaper today than 200mm2 at 14nm

Actually, you need to consider the difference in shipment between Intel and AMD, and Intel probably gets little bit of savings. 240mm2 is greater than 20% increase in die cost over 200mm2 due to how available dies in a wafer works as well.

Also, the important factor is average die size for the company. What's the average latest-gen die mm2 for AMD? What about Intel? If Intel sells a lot of 80mm2 dies than that would skew it lot.

Search

Skylake SKU

jpiniero

Lifer

rootheday3

Member

witeken

Diamond Member

III-V

Senior member

witeken

Diamond Member

mikk

Diamond Member

AtenRa

Lifer

IntelUser2000

Elite Member

TRENDING THREADS