ASML slide describing the difficulties facing semiconductor scaling

IntelUser2000 · Aug 23, 2017

Let's put reality into perspective why enhancements that were relatively trivial in comparison a decade ago is extremely difficult today.

From ASML:

asml_20161031_04_investor_day_2016_euv_and_its_business_opportunity_hmeiling_4_575px.png

PCWatch was also saying how Intel's 10nm would be in the difficulty level represented by 7nm in the slide, because Intel is aiming for Foundry 7nm densities with their 10nm process.

The first to get smaller, reaches those difficult to reach walls faster. Eventually we'll see a point where except in truly exotic scenarios, we won't see the smallest process ever used.

That's why we also see CPU architectures starting to converge mostly in terms of performance per clock, or also known as IPC(Instructions Per Clock).

Physics are starting to normalize the differences that were huge(due to difference in amount of resources a company can put in). It will still exist, but become smaller.

EUV might delay things a bit more but there's no magic solution. The companies are talking about using EUV selectively, like in critical circuits. We do not need to be experts in order to see a "practical" end coming in the near future.

DrMrLordX · Aug 23, 2017

They did say that process shrinks would come to an end. Looks like we're up against the wall.

LightningZ71 · Aug 23, 2017

From all that I've read, 7nm should make it into widespread, volume production. 5nm may be possible (theoretically, there is definitely development work being done there, but, volume production looks to be a long way off). Anything smaller, and you're significantly under ten atoms for device width and just handling that is beyond the scope of anything that can be industrialized at the moment.

DA CPU WIZARD · Aug 23, 2017

So is it possible that we are moving towards a world in which we are stuck at 5nm or even 7nm for closer to a decade than not, resulting in a very mature "modern" process in which die size inflates to compensate?

scannall · Aug 23, 2017

Intervenator said:
So is it possible that we are moving towards a world in which we are stuck at 5nm or even 7nm for closer to a decade than not, resulting in a very mature "modern" process in which die size inflates to compensate?

Different materials could get us into the 3nm range. Maybe. But larger caches, and more cores should have been pushed starting back at 32nm.

DA CPU WIZARD · Aug 23, 2017

scannall said:
Different materials could get us into the 3nm range. Maybe. But larger caches, and more cores should have been pushed starting back at 32nm.

I agree. I am imagining a theoretical world in which we hit 5nm and *boom*, we are stuck for many years. What market forces would be the most significant, drive the most change, and in which ways would this result in actual products changing? Or would we not see *any* change? I think it has become clear that Intel is either willing or only able to let each generation's IPC grow only a few % faster, at the same core count. TLDR, might we see a world in which process tech, the only real driver of growth in performance, becomes a non-factor, resulting in little to no growth?

darkswordsman17 · Aug 23, 2017

I think that's why there's work in trying to sort out other bottlenecks. Stuff like the memory and storage subsystems (stuff like memristors), and them working on what will likely become co-processors (stuff like the Tensor processors Google developed). And likewise, they're looking at interconnects (since they're hitting a wall in process, they're going to have issues making larger and larger chips), which I think they're looking at fiber optics or other light transmission setups. They're allegedly looking into that even for intrachip stuff, although I'd guess that would be more between processor blocks (so maybe the future of Infinity Fabric type of deal).

I wonder if this will put an extra focus on quantum chip development.

Also, what about going 3D with chips, where you could make them larger. What I mean by that is say you take a chip, say a GPU, where instead of it being on a substrate put on one side of the card, it would basically have silicon on both sides, then you could put heat transfer systems on both sides of the card, enabling a bigger chip, and a way of dissipating the extra heat it produces. It wouldn't necessarily have to be a single chip either, but by being so close proximity it'd keep the wiring between them short. And you could layer stuff like memory (cache for instance) between the logic layers.

maddie · Aug 23, 2017

IntelUser2000 said:
Let's put reality into perspective why enhancements that were relatively trivial in comparison a decade ago is extremely difficult today.

PCWatch was also saying how Intel's 10nm would be in the difficulty level represented by 7nm in the slide, because Intel is aiming for Foundry 7nm densities with their 10nm process.

The first to get smaller, reaches those difficult to reach walls faster. Eventually we'll see a point where except in truly exotic scenarios, we won't see the smallest process ever used.

That's why we also see CPU architectures starting to converge mostly in terms of performance per clock, or also known as IPC(Instructions Per Clock).

Physics are starting to normalize the differences that were huge(due to difference in amount of resources a company can put in). It will still exist, but become smaller.

EUV might delay things a bit more but there's no magic solution. The companies are talking about using EUV selectively, like in critical circuits. We do not need to be experts in order to see a "practical" end coming in the near future.

Sure about that? What's the connection with IPC and node size?

LightningZ71 · Aug 23, 2017

Going 3D with processors will present a LOT of thermal management issues. I can see them settling EVENTUALLY on 5nm, with maybe a 3 layer design that has a large number of cores, modest L3 caches, and a prodigious L4 cache that takes up at least one full layer. The layers themselves will be space inefficient as they will require extensive thermal "via"s to transit the heat from the lower layers to the heatsink. I don't see MCMs being popular in the consumer space ever as they do require several extra manufacturing steps and will likely never be price efficient. That being said, in the pro-sumer and server/workstation space, you may see MCMs that combine a multi-layer CPU with several memory modules of advanced design (perhaps an actually worthy version of optane). You could also see MCMs for custom logic solutions for some markets, such as machine learning, rendering, encryption, etc that are combined with CPUs or APUs.

Software is going to have to evolve as well. It will need to make different assumptions about the memory hierarchy and CPU arrangement of the host hardware that reflect high core counts and much faster memory access all the way down the hierarchy.

As for Intel's (lack of) progress in IPC and overall system throughput over the last half decade, it is beginning to show that they are likely deliberately slow-walking performance increases to stay relevant. They know that process tech is nearing its physical limits, and realize that their competitors are going to struggle to stay with them on process nodes for a while. There's no need to push hard for nearly impossible levels of miniaturization if your competitors will take years to match them. Now, things have caught up with them. This is why they pushed so strongly to diversify recently, only to learn the hard way that you can't just buy expertise and competitiveness in markets that have mature competitors (they should have known from their own CPU division).

IntelUser2000 · Aug 24, 2017

LightningZ71 said:
Anything smaller, and you're significantly under ten atoms for device width and just handling that is beyond the scope of anything that can be industrialized at the moment.

It's not that small at all.

Node names are often placeholders and really only useful for generational density comparisons.

The smallest pitch for Intel 10nm is 34nm, which is Fin pitch. A full node decreases that to 0.7x. To get that Fin pitch to 10nm it would take,

10nm node = 34nm pitch
5nm node = 17nm pitch
2.5nm node = 8.5nm pitch

Sure about that? What's the connection with IPC and node size?

Nodes bring three noticeable benefits: Density, Performance, or Power reduction

The reason Intel used Hi-K dielectrics on 45nm is because on 65nm the thickness of the dielectric material was only 1.2nm. Since it starts getting leaky, Hi-K material allowed "1.0nm equivalent thickness" to happen at 45nm, so it brings the performance benefits. The real thickness is said to be 3nm for 45nm. If they stuck with the normal material for the dielectric and they scaled it down to 1nm from 1.2nm the leakage would have increased tremendously.

The general metric of node names were Gate Length. At 22nm Intel's Gate Length is said to be 25nm. Without the reduction in gate length though, you lose some performance. Hence why they said you need FinFET/Trigate to have expected performance. The new technology allows Gate length size to not become smaller but retain the performance benefits of having a smaller gate.

So you see, when the sizes were far larger, you didn't need material innovations to achieve the gains. That just adds to the complexity and difficulty.

And this impacts performance, and power used. Having a better transistor doesn't just mean clock speeds. It could have same clocks for certain circuits but have lower latency.

So is it possible that we are moving towards a world in which we are stuck at 5nm or even 7nm for closer to a decade than not, resulting in a very mature "modern" process in which die size inflates to compensate?

Yes. Or use multiple small dies.

Atari2600 · Aug 24, 2017

Intervenator said:
might we see a world in which process tech, the only real driver of growth in performance, becomes a non-factor, resulting in little to no growth?

To extrapolate from process to performance and say no change in one equates to no change in the other is a bit strong!

There are various avenues which have not been taken to market (at least, in the x86 space)
- Hetrogenous cores on the same die
- 3D layups for cache
- 3D pathways for localised cooling loops within the CPU
- Use of optics for data pathways

maddie · Aug 24, 2017

IntelUser2000 said:
It's not that small at all.

Node names are often placeholders and really only useful for generational density comparisons.

The smallest pitch for Intel 10nm is 34nm, which is Fin pitch. A full node decreases that to 0.7x. To get that Fin pitch to 10nm it would take,

10nm node = 34nm pitch
5nm node = 17nm pitch
2.5nm node = 8.5nm pitch

Nodes bring three noticeable benefits: Density, Performance, or Power reduction

The reason Intel used Hi-K dielectrics on 45nm is because on 65nm the thickness of the dielectric material was only 1.2nm. Since it starts getting leaky, Hi-K material allowed "1.0nm equivalent thickness" to happen at 45nm, so it brings the performance benefits. The real thickness is said to be 3nm for 45nm. If they stuck with the normal material for the dielectric and they scaled it down to 1nm from 1.2nm the leakage would have increased tremendously.

The general metric of node names were Gate Length. At 22nm Intel's Gate Length is said to be 25nm. Without the reduction in gate length though, you lose some performance. Hence why they said you need FinFET/Trigate to have expected performance. The new technology allows Gate length size to not become smaller but retain the performance benefits of having a smaller gate.

So you see, when the sizes were far larger, you didn't need material innovations to achieve the gains. That just adds to the complexity and difficulty.

And this impacts performance, and power used. Having a better transistor doesn't just mean clock speeds. It could have same clocks for certain circuits but have lower latency.

Yes. Or use multiple small dies.

You said:
"That's why we also see CPU architectures starting to converge mostly in terms of performance per clock, or also known as IPC(Instructions Per Clock).

I questioned:
"Sure about that? What's the connection with IPC and node size?"

You answered:
"Nodes bring three noticeable benefits: Density, Performance, or Power reduction

The reason Intel used Hi-K dielectrics on 45nm is because on 65nm the thickness of the dielectric material was only 1.2nm. Since it starts getting leaky, Hi-K material allowed "1.0nm equivalent thickness" to happen at 45nm, so it brings the performance benefits. The real thickness is said to be 3nm for 45nm. If they stuck with the normal material for the dielectric and they scaled it down to 1nm from 1.2nm the leakage would have increased tremendously.

The general metric of node names were Gate Length. At 22nm Intel's Gate Length is said to be 25nm. Without the reduction in gate length though, you lose some performance. Hence why they said you need FinFET/Trigate to have expected performance. The new technology allows Gate length size to not become smaller but retain the performance benefits of having a smaller gate.

So you see, when the sizes were far larger, you didn't need material innovations to achieve the gains. That just adds to the complexity and difficulty.

And this impacts performance, and power used. Having a better transistor doesn't just mean clock speeds. It could have same clocks for certain circuits but have lower latency."

Frankly, I don't see where node size is relevant to IPC, at least from a theoretical point of view. I can see where certain complex designs are unable to be manufactured on a low density node because of # transistors need, but we're way past that limitation. The fact that we have had multi-core CPUs a long time now implies that node size was no longer a factor for IPC improvement. The extra available transistors were used to enable additional cores instead of improving per core IPC.

As regards to latency, if expressed in cycles, there is no change. The only absolute latency improvement in ns comes from increased clocks. A circuit might have an inverse relationship between latency and # electronic elements. More transistors = less latency and the reverse. An individual transistor clock and latency is a fixed relationship. ns = 1/clock

In other words, where are you getting that IPC is converging due to process node? This is wrong.

LightningZ71 · Aug 24, 2017

Well, it may not be that shrinking the same, exact design down from one node to a smaller one will not directly affect the IPC, having a smaller node enables things like larger L1 and L2 caches, larger L3 caches, and throwing more circuitry at specialized compute units like the AVX vector processing units. All of those things can allow for a higher APPARENT IPC from an extra-core point of view. The actual instruction latency between issue and retirement as a function of clock cycles may not change for non-optimized instructions, but if you can now keep more instructions in flight due to more circuits for a wider issue unit, and that issue unit can be kept better fed by larger high speed caches, you'll have a measured improvement in IPC. That's what I understand the improvement in IPC from smaller nodes is referring to. Of course, at this point, you're already in an era of diminished returns with those things, but you can look back at the last few nodes of the core architecture and see that some of those things went on there to improve IPC.

Ajay · Aug 24, 2017

Well, there is a reason for MCM and EMIB - CPU SoCs will be broken out again and expand in size for higher performance as time goes on. AMD is a bit ahead of Intel in this regard (because of cost restraints, they needed a non-reticle limited chip and high core count scalability).

IntelUser2000 · Aug 24, 2017

maddie said:
Frankly, I don't see where node size is relevant to IPC, at least from a theoretical point of view. I can see where certain complex designs are unable to be manufactured on a low density node because of # transistors need, but we're way past that limitation. The fact that we have had multi-core CPUs a long time now implies that node size was no longer a factor for IPC improvement. The extra available transistors were used to enable additional cores instead of improving per core IPC.

You are right in that:
-It doesn't impact it directly
-Architectural improvement has diminishing returns. Codes have inherent limits on "IPC"

Chips are nowadays almost bound by thermals and power usage. In the heyday of scaling it was said you needed 4x the core circuits to improve IPC by 2x(assuming everything else wasn't a bottleneck, like inherent code level parallelism, or memory subsystem).

Why do you think Intel went with the mantra of no more than "1% of power used for 1% of performance"? Because they are power limited. Process scaling gave you great benefits. If the A core used 50% more power than B core, it did not matter as long as A core moved to a y process and B core was on x. Who cares if you were behind a bit on architectural efficiency? Just move to a new process faster and that gap exists perpetually. If you look at benchmarks of older chips, you could delete every detail from the articles and I could tell you which chips were on a new process and which were not just by looking at the bars.

You still got 1.5 x 0.5 = 25% lower power than B core while having 15% more IPC

With Pascal Nvidia spent significant amount of time on improving clocks. So did AMD with Vega. If process allowed easy gains, they could have spent the resources(transistors/time/money) on improving the architecture. Resources and time are finite. If you are paying attention to one thing, you'll naturally pay less attention to another.

When we connect everything, then we come to the conclusion lack of process scaling become one of the big factors on why IPC improvement is stagnating. You can not disassociate each other.

The fact that we have had multi-core CPUs a long time now implies that node size was no longer a factor for IPC improvement.

With multi-core chips you can just turn them off. So the power factor can be minimized. You don't have to have it active all the time, and you can clock them lower too, even individually. You can't really power gate parts of the cores, its really hard without incurring some latency and performance loss.

Search

ASML slide describing the difficulties facing semiconductor scaling

IntelUser2000

Elite Member

DrMrLordX

Lifer

LightningZ71

Platinum Member

DA CPU WIZARD

Member

scannall

Golden Member

DA CPU WIZARD

Member

darkswordsman17

Lifer

maddie

Diamond Member

LightningZ71

Platinum Member

IntelUser2000

Elite Member

Atari2600

Golden Member

maddie

Diamond Member

LightningZ71

Platinum Member

Ajay

Lifer

IntelUser2000

Elite Member

TRENDING THREADS