Skylake SKU

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jpiniero

Lifer
Oct 1, 2010
16,825
7,268
136
I don't think you can say if the L3 is going to be cut with Broadwell and/or Skylake GT3/4 parts.
 

rootheday3

Member
Sep 5, 2013
44
0
66
Enigmoid is correct - a gt3 GPU is less than 2x of a gt2. The reason is as follows:
Gt = non-slice (media stuff + 3d geometry pipeline + misc) + 1-N slices.

The slices contain slice common + # subslices (2-3 in current designs) + some amount of cache allocation.

Slice common= the pixel backends/z handling, scoreboards
Subslice = EUs + sampler + some misc stuff

In a gt3, there is one more slice (including slice common+ subslices) vs a gt2, but the other non-slice stuff (media, 3d geometry, ...) is not duplicated. Caveat: to help media workloads scale, gt3 HSW does duplicate some of the media assets.

People keep talking about adding more EUs or making the EUs faster. It should be pretty clear that HSW is already compute heavy with a 10:1 ratios of eu:sampler per subslice.

bdw moves to 8:1 ratio so as to not be sampler bound so often. Sublices per slice goes from 2 to 3 thus total EUs per slice from 20 to 24 (20% more) but samplers from 2 to 3 (50%).

It is true that going wider but slower is good for power efficiency but it is also expensive on die area. Since the slice/ Subslice logic (mostly EUs + samplers + added cache) is the biggest part of the gt die area, it would make sense for Intel to go after architectural things that get more perf out if the existing EUs by addressing bottlenecks elsewhere that cause the EUs (or samplers) to stall.

In some workloads, HSW is competitive in perf/clk to an AMD or NVidia design if you normalize by theoretical FLOPs (1 eu = max 16 flops/clk) and clock speed. Other workloads not so much. The latter include MSAA, certain non promoted z cases, depth resolves, and, as an integrated part overall memory bandwidth/frame that hits the memory controller. Look for Intel to work on these latter areas vs raw EU count increase alone.
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
Enigmoid is correct - a gt3 GPU is less than 2x of a gt2. The reason is as follows:
Gt = non-slice (media stuff + 3d geometry pipeline + misc) + 1-N slices.
It's quite disappointing, honestly, that for a company that talks so much about the exponential Moore's law, the equation is plus N-1 slices instead of plus N*1 slices; GT5 at 10nm would only have 33% more shaders despite a doubling in density. I guess Intel only wants to go after the low-end of the dGPU market, instead of AMD's more aggressive approach.

Other workloads not so much. The latter include MSAA, certain non promoted z cases, depth resolves, and, as an integrated part overall memory bandwidth/frame that hits the memory controller. Look for Intel to work on these latter areas vs raw EU count increase alone.
Yes, that was clear from the presentations at IDF. Do you think Intel will seek higher clock speeds for the GPU to mitigate die area costs?
 

III-V

Senior member
Oct 12, 2014
678
1
41
I think you mean N * 2 :p

Intel already runs at pretty high clocks. I'd expect more of the same in that regard. But really, they are very good in the compute department. They need to focus on other things in the graphics pipeline over increasing EU count per slice, which is what they're doing.
 

witeken

Diamond Member
Dec 25, 2013
3,899
193
106
I think you mean N * 2 :p
No no, I failed a lot there with that N*1, but exponential means in this case 2^(N-2) slices, so GT2 would have 1 slice, GT3 has 2, GT4 has 4, etc. Every node you add 2X the slices so you can increment the GT number by 1, just like AMD and Nvidia are doing with their SMMs and CUs.
 

mikk

Diamond Member
May 15, 2012
4,304
2,391
136
I doubt Skylake has the same CPU and GPU size. Any size calculation based on Broadwell is flawed.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Great job!

Don't forget that not all of the slice is replicated.


It looks like die sizes will be more around
117 mm^2 for 2 + 3
134 mm^2 for 4 + 3
172 mm^2 for 4 + 4

Yea it seams you are right, ill have a look at them again tomorrow and make new drawings. ;)
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Yea it seams you are right, ill have a look at them again tomorrow and make new drawings. ;)

The way you get there can be better, but I think your numbers are probably right. GT3e dies are bigger than GT3 dies because GT3e dies need tags for the eDRAM, since it uses it as a cache. Otherwise, they would be pretty similar. On Haswell, it can be about 20mm2.

240mm2 at 28nm is way cheaper today than 200mm2 at 14nm ;)

Actually, you need to consider the difference in shipment between Intel and AMD, and Intel probably gets little bit of savings. 240mm2 is greater than 20% increase in die cost over 200mm2 due to how available dies in a wafer works as well.

Also, the important factor is average die size for the company. What's the average latest-gen die mm2 for AMD? What about Intel? If Intel sells a lot of 80mm2 dies than that would skew it lot.
 
Last edited: