Enigmoid is correct - a gt3 GPU is less than 2x of a gt2. The reason is as follows:
Gt = non-slice (media stuff + 3d geometry pipeline + misc) + 1-N slices.
The slices contain slice common + # subslices (2-3 in current designs) + some amount of cache allocation.
Slice common= the pixel backends/z handling, scoreboards
Subslice = EUs + sampler + some misc stuff
In a gt3, there is one more slice (including slice common+ subslices) vs a gt2, but the other non-slice stuff (media, 3d geometry, ...) is not duplicated. Caveat: to help media workloads scale, gt3 HSW does duplicate some of the media assets.
People keep talking about adding more EUs or making the EUs faster. It should be pretty clear that HSW is already compute heavy with a 10:1 ratios of eu:sampler per subslice.
bdw moves to 8:1 ratio so as to not be sampler bound so often. Sublices per slice goes from 2 to 3 thus total EUs per slice from 20 to 24 (20% more) but samplers from 2 to 3 (50%).
It is true that going wider but slower is good for power efficiency but it is also expensive on die area. Since the slice/ Subslice logic (mostly EUs + samplers + added cache) is the biggest part of the gt die area, it would make sense for Intel to go after architectural things that get more perf out if the existing EUs by addressing bottlenecks elsewhere that cause the EUs (or samplers) to stall.
In some workloads, HSW is competitive in perf/clk to an AMD or NVidia design if you normalize by theoretical FLOPs (1 eu = max 16 flops/clk) and clock speed. Other workloads not so much. The latter include MSAA, certain non promoted z cases, depth resolves, and, as an integrated part overall memory bandwidth/frame that hits the memory controller. Look for Intel to work on these latter areas vs raw EU count increase alone.