What factors determine the yield for a new chip?

Smartazz

Diamond Member
Dec 29, 2005
6,128
0
76
If you up the transistor count on a particular chip, will the yield go down for that particular chip and what are the different factors that would determine the yield? I would guess that a big one would be the process used to make the particular chip. I've also read that if you make the cache sizes bigger in a chip the yields will go down, why would this be? Thanks in advance.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
pm wrote a post about yield a while back. Unfortunately, I don't remember most of it, but I do remember this example:
Take a few grains of sand and toss them onto a wafer. The sand represents defects. If you have bigger dies, there's a higher chance of a given die being defective; with smaller dies, any individual die is less likely to be hit by a defect.

Now, to address the rest of your question...

Caches are sort of a special case, because there are things you can do so that even with a defect in the cache, it still works (and you don't even have to do something like disabling half of it).

There are actually a lot of factors that affect yield. There's a somewhat complicated interplay between the manufacturing process and the chip's design. I'm making some simplifications here to keep this readable.

A given manufacturing process can produce some things more reliably than others (for example, the polysilicon transistor gates usually print much better if they're running in one direction that the other direction). Using the shapes that print well helps yield. Also, there's a lot of variation in the process. The transistors on one die will be a little different from the transistors on another die on the same wafer; they may be even more different on a different wafer. The chip fab will usually tell the chip designers what the variation is - for example, they might say, "90% of transistors will be within 10% of the nominal speed. 95% will be within 20%. 99% will be within 50%. 99.9% will be within 75%". Now, it's easy to build a small, fast circuit that works with nominal transistors, but if you want any decent yields, you have to make sure it works "most of the time". You have to pick some cutoff where you say, "if the transistors are more than x% off the target, the circuit won't work". If you do a very aggressive design, it might not work when the transistors are more than 10% off nominal, which means you're going to have to throw out a lot of dies. If you do a more conservative design, you'll get better yields. You can't design too conservatively, because that tends to make things bigger (and then the grain-of-sand explanation tells you why that's bad) and/or slower.

It's very tempting to use super-aggressive circuits, because they're just so fast/small (both good attributes). One example is called a "pulsed latch". It's a design for a storage element that has great properties - until the manufacturing process changes too much. Pulsed latches are very sensitive to the relative delays of things, and as the fab gets better, they can make things faster - but everything doesn't improve by the same amount. If the important ratios change too much, the pulsed latch won't store data properly. As the "nominal" attributes of the process get faster because the fab is changing things, due to the variations discussed above, more and more dies will be nonfunctional.

Getting back to the part about some things being produced more reliably than others, "vias" tend to be flaky. They're the connections between the layers of metal, and they're just difficult to create. To address this, a given connection is often made with multiple vias, so that if one of them fails, the others still conduct current. A design that uses just one via for each connection is likely to have lower yield than a design that uses multiple vias for backups.
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
The larger the chip the lower the yield will be on a given process node, its pretty much that simple. You can go into the math and see why a negative binomial distribution is used instead of the more obvious Poisson distribution, but the idea is the same either way, yield drops of DRASTICALLY with increasing chip size. If you were trying to make a native quad core on a certain process as compared to two dual cores your yield would be considerably reduced. This is of course why Intel uses MCM since the yield is the same between "quad" and dual cores. OF course there are tricks and stuff, AMD can take a damaged quad and sell it as a dual core, but then you aren't making as much money and so it still hurts costs.
 

Smartazz

Diamond Member
Dec 29, 2005
6,128
0
76
Originally posted by: BrownTown
The larger the chip the lower the yield will be on a given process node, its pretty much that simple. You can go into the math and see why a negative binomial distribution is used instead of the more obvious Poisson distribution, but the idea is the same either way, yield drops of DRASTICALLY with increasing chip size. If you were trying to make a native quad core on a certain process as compared to two dual cores your yield would be considerably reduced. This is of course why Intel uses MCM since the yield is the same between "quad" and dual cores. OF course there are tricks and stuff, AMD can take a damaged quad and sell it as a dual core, but then you aren't making as much money and so it still hurts costs.

So basically that's what they were forced to do with the Cell Broadband Engine. They started disabling cores just in case they start failing in order to help produce better yields?
 

bobsmith1492

Diamond Member
Feb 21, 2004
3,875
3
81
I was talking to a professor at a grad school I was visiting. His area of research is in fault-tolerant code. The premise was that, as hardware continues to shrink, the smaller transistors and cache are more susceptible to exotic radiation that can make bits randomly flip. In some areas of code, this will rarely affect it; others, however, may need to be made triple redundant with a system to check whether one is wrong or not. Of course, then this checking system could have errors as well... anyway, I thought that was sort of relevant to the topic. :p
 

BrownTown

Diamond Member
Dec 1, 2005
5,314
1
0
Well, thats a completely different field since chip yield is based on defects that permanently disable a chip whereas radiation hardness is based on induced charges from radiation. Triple redundancy is the "brute force" sort of way to do it since you just do everything 3 times and if there is an error then only one will change so you go with the other two which have a majority. Another good way is to simply use a more radiation hard process. We were talking about hardening FPGAs against radiation for space exploration, and one of them just went and used an old 180nm process simply because the charges imparted from a radioactive particle strike wouldn't be enough to do anything, so problem solved. Actually if you want a fun excersize look at the amount that a SINGLE charged particle impeded in the gate will change the threshold voltage on a 45nm process. Even the tiniest amounts of ions in the manufacturing process will destroy a chip with structures that small.
 

bobsmith1492

Diamond Member
Feb 21, 2004
3,875
3
81
Yeah, this professor was actually a code guy, so he was looking at the code end rather than hardware; redundancy was used only in situations where data integrity is critical. I agree, it's a bit off-topic, but it's something that may need to be thought about as shrinkage occurs...