What will 14nm GPUs look like?

NTMBK · Jun 7, 2015

Pretty simple- what do we think the next generation of GPUs will look like? Here are my predictions:

1. HBM for all

HBM will be getting cheaper, with HBM2 providing 2-4GB per stack and higher frequencies. I expect the entire GPU stack to move to HBM, with 1 stack for low end chips, 2 stacks for mid range and 4 stacks for the high end. Probably 2GB stacks for consumer cards, and 4GB stacks for workstation cards.

2. Dies will shrink again

I suspect that within each price bracket, we will see die sizes close to the early days of 28nm- top end part roughly the size of the 7970, not R9 290 or Fiji. 14nm is not meant to bring any significant improvements in $/transistor, so don't expect considerably more transistors than a chip like GM200.

3. Return to smaller caches

A higher speed, lower latency and more efficient memory bus will reduce the need for the large on-chip caches we saw in Maxwell. More efficient transistors means that even with a barely increased transistor budget, more can be devoted to power hungry logic like shaders and texture units without blowing out the power budget.

4. Slightly higher clocks

Again, as in 3, more efficient transistors should enable higher clock speeds- and squeezing more frequency out of a smaller chip instead of going to a larger, lower clocked chip makes sense with expensive 14nm transistors.

5. Simpler, smaller PCBs

As you can tell from looking at a picture of the Titan X PCB, a lot of PCB space is dedicated to RAM chips and the wiring to connect them to the GPU. HBM removes all of that and replaces it with a GPU and a couple of DRAM stacks on an interposer, so boards should be a lot shorter.

So what do you all think? What do you expect to see?

[P.S: Please don't turn this into yet another AMD/NVidia fanboy pissing match. I want to discuss the fundamental tech which will affect both GPU companies, not bitch about driver quality or Gameworks shenanigans. If you want to do that, pick one of the 5 other threads which are covering that argument at any given time.]

Sweepr · Jun 7, 2015

Wrong forum?

Looks like the right forum to me. -DrPizza

NTMBK · Jun 7, 2015

Sweepr said:
Wrong forum?

Wrong forum D: If any mod wants to move this I'd appreciate it...

Moved.

ShintaiDK · Jun 7, 2015

1. I doubt HBM for all due to cost.

2. Absolutely. 300mm2 will be a big die. We will see at least 4 generations on 14/16nm. And we may never get any GPU below 14/16nm either.

3. No chance. HBM isnt any better than GDDR5 there.

4. Perhaps 10%.

5. Absolutely. But we gonna see the cards being close to same size with air coolers hanging out.

What I do expect is price increases and the death spiral of the GPU continue. The TDP will also have to go down significantly, since people outright reject high TDP cards outside the "enthusiast" segment.

Borealis7 · Jun 7, 2015

to quote myself from another thread:
you can buy 14nm FinFET GPUs today...they're called "Iris" and they come with an integrated Intel CPU!

NTMBK · Jun 7, 2015

Borealis7 said:
to quote myself from another thread:
you can buy 14nm FinFET GPUs today...they're called "Iris" and they come with an integrated Intel CPU!

You know I meant dGPUs, not APUs

ShintaiDK · Jun 7, 2015

NTMBK said:
You know I meant dGPUs, not APUs

He didnt talk about APUs either

NTMBK · Jun 7, 2015

ShintaiDK said:
He didnt talk about APUs either

APU, CPU with iGPU, potato, potatoe

maddie · Jun 7, 2015

1) HBM for all
Probably not for next all gen cards. Most GDDR5 memory is used for GPUs so HBM could cost roughly the same if used in equal volumes. In fact with future APUs also migrating to HBM we might see greater production of HBM vs GDDR5 and thus lower prices for memory. Then we get all.
I also see more than 4 stacks for very high end professional products.

2) Dies will shrink again
At first due to yield problems. Over time the need for higher performance will push die sizes to maximum values. The actual die production costs is only part of the entire cost, so even if $/transistor does not improve, double transistors will not mean a doubling of selling price.

3) Return to smaller caches
I see no change. HBM latency is worse than on die caches and it will still take more energy to transfer off die than to stay on die.

4) Slightly higher clocks
Power has to scale with the inverse of transister count or better to allow faster clocks for the same die area. Meaning that if for example a max die now draws 250W, then for twice the transistors it will draw more than 250W @14nm unless the transistors are twice as efficient. Of course Nvidia showed us that more transistors do not necessarily mean more power. Up in the air.

5) Simpler, smaller PCBs
Yes, cards will be smaller.

I will add

6) PIM
One big unknown is the use of PIM memory stacks. This could allow some processing to offload to the memory itself using HSA techniques. Someone with a lot of deep coding knowledge will have to estimate the potential benefits here.

JDG1980 · Jun 7, 2015

NTMBK said:
1. HBM for all

HBM will be getting cheaper, with HBM2 providing 2-4GB per stack and higher frequencies. I expect the entire GPU stack to move to HBM, with 1 stack for low end chips, 2 stacks for mid range and 4 stacks for the high end. Probably 2GB stacks for consumer cards, and 4GB stacks for workstation cards.

Eventually, but I don't think this will happen on the smaller GPUs at first. As I said in another thread, I wouldn't be surprised if the first FinFET+ GPU is the "GP107", with about the same die size as the GM107 (148 sq. mm.) but double the transistor count, putting it comfortably ahead of the GTX 960 with the power budget of the GTX 750 Ti. I think such a chip, because of its price point, would have GDDR5. HBM2 is going to be more expensive than GDDR5 at first, and Nvidia may be worried about biting off more than they can chew. Tackling a new architecture, new node, and new memory subsystem all at once is a bit intimidating. It would make more sense to test a new architecture with an existing memory subsystem, and minimize the new node transition hassles by going with a small die size. And a chip like this would sell well both in AIB cards and to OEMs.

AMD tends to be more aggressive with new memory technologies than Nvidia, and there's a good chance that Arctic Islands will skip the low end entirely. (AMD would prefer to pitch APUs to users with less demanding graphical needs.) Therefore, it wouldn't surprise me to see HBM2 across the whole 400 series lineup, at least all the new products. (They may keep some rebranded trash at the low end and for budget OEMs, just like Nvidia does with tiny Fermi chips now.)

Eventually it will all be HBM2, even APUs, once pricing levels off. And it will - early adopters always pay a premium.

NTMBK said:
2. Dies will shrink again

I suspect that within each price bracket, we will see die sizes close to the early days of 28nm- top end part roughly the size of the 7970, not R9 290 or Fiji. 14nm is not meant to bring any significant improvements in $/transistor, so don't expect considerably more transistors than a chip like GM200.

I don't see this as being much different than 28nm. The first GK110 chips (561 sq. mm.) had very low yields and were colossally expensive. There weren't even enough chips for Tesla cards at first, much less anything consumer-focused; the initial allocation all went to the Oak Ridge supercomputer. It wasn't until November 2012 that Tesla GK110 cards were available, and this was at multi-thousand-dollar price tags. The Titan didn't come along until February 2013, and even at its $999 price point, it still used partially-disabled chips, indicating that TSMC was still having yield issues with dice of this size. This was over a year after the first 28nm GPUs hit the market. It wasn't until late 2013, when the process was about 18 months old, that big dice started to yield well enough to sell large chips in $500-$750 cards.

We'll probably see the same here. Early chips will be somewhere around 300-350 sq. mm. (Tahiti was 352 sq. mm., GK104 was 294). These chips will have 8 to 10 billion transistors, and will provide performance moderately better than 28nm big chips (GM200/Fiji) at a power budget of 150W-175W or so. The top consumer SKUs will probably sell for about $600, with FirePro and Quadros of course fetching much more. Nvidia will probably be working on their big chip at the same time, but I wouldn't expect the Pascal Tesla card until early 2017 at the soonest, and consumer-level variants of "GP100" likely won't happen until late 2017 to early 2018.

<snip>

NTMBK said:
5. Simpler, smaller PCBs

As you can tell from looking at a picture of the Titan X PCB, a lot of PCB space is dedicated to RAM chips and the wiring to connect them to the GPU. HBM removes all of that and replaces it with a GPU and a couple of DRAM stacks on an interposer, so boards should be a lot shorter.

Thermal density could be an issue. You can probably evacuate 175W or so with a well-designed air cooler on a short PCB, but once you get up to 250W+ for the big chips, you are going to start to run into issues. There are several possible ways this could be resolved. The cooler could simply be much bigger than the PCB, or the PCB could be oversized with unused space to support cooler mounting (we already see this with some GTX 960 models). Closed-loop water cooling is another possible solution, though I don't know how the professional market would react to this. An even better solution, though a heavier lift in terms of engineering and coordination costs, would be to finally ditch the ridiculously outdated ATX form factor (and its derivatives) in favor of something designed for compactness and thermal efficiency. The Mac Pro does a great job of cooling two GPUs not known for their energy efficiency (Tahiti) despite its small size and the fact that it doesn't use any exotic cooling tech, just a big triangular aluminum heatsink. The reason it gets away with this is that it has a custom form factor that puts all the heat-generating components in direct contact with the giant heatsink, rather than trying to manage thermal output from many different places in the case as a standard ATX case does. I'd really like to see a form factor that is similar to the Mac Pro, but based on an open standard and designed for DIY PC components. The standard would probably use a huge square heatsink, 180mm or so (need to avoid Apple's patents) with the motherboard on one surface, video cards on 1 or 2 others, and the final surface perhaps used to dissipate heat from the PSU. Cooling could be done with one large fan at the bottom, blowing up through the heatsink, thermally controlled. You'd need new form factors for motherboards, graphics cards, and GPUs to do this, but the results would be far superior to what can be obtained today with ATX and its derivatives.

raghu78 · Jun 7, 2015

1. I think HBM2 will launch at the high end in 2016. So do not expect HBM2 cards to sell for below USD 500. I predict the GP204 based GTX 1080 or whatever its called to launch at USD 650-700 and the GTX 1070 to be priced at USD 500 - 550. Same for AMD.

Later in 2017 when HBM2 production volumes increase and costs go down we might see the USD 300 - 500 cards with HBM2. I doubt we will see USD 200 and lower cards with HBM2 till late 2017 or early 2018.

2. Definitely. I also expect both AMD and Nvidia to be conservative with the die sizes of their first 16/14nm flagship FINFET GPUs for yield reasons. I am pretty much sure that both of them will stay at <= 300 sq mm for their first FINFET flagships.

3. Sure. With HBM2 providing 256 GB/s per stack definitely cache sizes will reduce. Nvidia will be able to reduce the huge cache sizes found in Maxwell.

4. The 16/14nm transistors will be vastly more efficient. But the big question is what is the state of the process wrt yields, defect density and other parameters. I think we should not expect Maxwell like OC from the first FINFET chips from Nvidia.

5. Surely. But you have to understand that for air cooling even with shorter PCB the cooler will have to be larger to dissipate the heat effectively. For AIO designs we will see lot of mini-ITX designs.

el etro · Jun 7, 2015

NTMBK said:
Pretty simple- what do we think the next generation of GPUs will look like? Here are my predictions:

1. HBM for all

HBM will be getting cheaper, with HBM2 providing 2-4GB per stack and higher frequencies. I expect the entire GPU stack to move to HBM, with 1 stack for low end chips, 2 stacks for mid range and 4 stacks for the high end. Probably 2GB stacks for consumer cards, and 4GB stacks for workstation cards.

The only bad thing of this tendency is that many ones will get out of business due to higher integration of components into the GPU die. Every other thing is a win for GPUs, especially on PCB/die area(this last due to much smaller size of the PHYs). HBM is so damn good that will even make videocards cheaper to manufacture in the future.

NTMBK said:
2. Dies will shrink again

I suspect that within each price bracket, we will see die sizes close to the early days of 28nm- top end part roughly the size of the 7970, not R9 290 or Fiji. 14nm is not meant to bring any significant improvements in $/transistor, so don't expect considerably more transistors than a chip like GM200.

The yields of making bigger chips will be much worse at the infancy of 14nm. When making 14nm chips gets cheaper and process get more mature, we will see big dies.

NTMBK said:
3. Return to smaller caches

A higher speed, lower latency and more efficient memory bus will reduce the need for the large on-chip caches we saw in Maxwell. More efficient transistors means that even with a barely increased transistor budget, more can be devoted to power hungry logic like shaders and texture units without blowing out the power budget.

I think no. Cache sizes only will get bigger with time, IMO. Die size cost is the only considerably penalty i think. Remember the using of HBM saving large die space. And sRAM caches are still much lower latency and much faster than what HBM can offers.
Putting it simply: Is more power efficiency and performance over maximum die area tradeoff.

NTMBK said:
4. Slightly higher clocks

Again, as in 3, more efficient transistors should enable higher clock speeds- and squeezing more frequency out of a smaller chip instead of going to a larger, lower clocked chip makes sense with expensive 14nm transistors.

Surely but not only this. Density of the chips also will go up very much in the newer process. "Economy" of die area(like the Hawaii "economy" of die area) will be bigger with 14nm processes.

NTMBK said:
5. Simpler, smaller PCBs

As you can tell from looking at a picture of the Titan X PCB, a lot of PCB space is dedicated to RAM chips and the wiring to connect them to the GPU. HBM removes all of that and replaces it with a GPU and a couple of DRAM stacks on an interposer, so boards should be a lot shorter.

Godsend, and also will be a tendency on future GPUs.

Kenmitch · Jun 7, 2015

Looking at the current nodes unexpected lifespan it's probably safe to say the next will be milked from day one.

1st release geared towards perf/watt maybe 10-20% bump for Xtra temptation.
Downhill from there probably. 10-15% gains?
Depending on willingness of competition to provide larger gains of course.

ThatBuzzkiller · Jun 7, 2015

I agree with just about everything OP mentioned ...

1. HBM and other competing technologies will gradually get a price reduction so this makes sense ...

2. This conjecture is also reasonable for the near term future going by the troublesome $/transistor scaling ...

3. GPUs never needed a significant amount of cache to begin with since latency hiding was mostly done by launching multiple in-flight wavefronts/warps to occupy the shader groups ...

4. This is probably the best idea for increasing the perf/$ ratio ...

5. Agreed ...

NTMBK · Jun 17, 2015

Well, at least 5) has come true- this little thing has higher performance than a 290X, apparently:

Also, I'd like to clarify 3: I am not predicting that we will see a reduction in total cache size, but in cache/shader.

PrincessFrosty · Jun 18, 2015

Sadly it seems that node shrinks are increasingly coming with longer periods of maturity, just how far we've pushed 28nm is an example of that. I think with the manufacturing defects, leaky chips, addtional voltage and temps that it will be initially disappointing and it wont be until at least the 2nd generation of GPUs at this size we'll see anything really jaw dropping.

JDG1980 · Jun 18, 2015

PrincessFrosty said:
Sadly it seems that node shrinks are increasingly coming with longer periods of maturity, just how far we've pushed 28nm is an example of that. I think with the manufacturing defects, leaky chips, addtional voltage and temps that it will be initially disappointing and it wont be until at least the 2nd generation of GPUs at this size we'll see anything really jaw dropping.

I think this may actually be one reason why FinFET has seen so many delays from both TSMC and Samsung/GloFo, and why the 20nm high performance node was cancelled. The 28nm process is mature enough that an immature FinFET process would look bad in comparison, and the foundry customers wouldn't be interested in it. Therefore, the foundries have to get FinFET fairly well optimized before anyone will buy it, unlike with 28nm where the early adopters paid for the privilege of beta testing the new node. TSMC dug this hole themselves with four years of stagnation.

ShintaiDK · Jun 19, 2015

JDG1980 said:
I think this may actually be one reason why FinFET has seen so many delays from both TSMC and Samsung/GloFo, and why the 20nm high performance node was cancelled. The 28nm process is mature enough that an immature FinFET process would look bad in comparison, and the foundry customers wouldn't be interested in it. Therefore, the foundries have to get FinFET fairly well optimized before anyone will buy it, unlike with 28nm where the early adopters paid for the privilege of beta testing the new node. TSMC dug this hole themselves with four years of stagnation.

20nm high performance was never cancelled. Its just AMD/nVidia not using it. 16FF/16FF+ is exactly the same.

NTMBK · Jun 19, 2015

ShintaiDK said:
20nm high performance was never cancelled. Its just AMD/nVidia not using it. 16FF/16FF+ is exactly the same.

Wasn't GloFo's 20nm process canned? (Along with their internal 14nm process.)

What will 14nm GPUs look like?

Lifer

Diamond Member

Lifer

Lifer

Platinum Member

Lifer

Lifer

Lifer

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Lifer

Platinum Member

Golden Member

Lifer

Lifer