Originally posted by: pm
In my experience, small voltage increases are worse for the long-term reliability of a microprocessor (or any other CMOS VLSI integrated circuit) than heat.
If you have a CPU that's running at 1.2V and 40 Celsius (on-die temp) and increase either the temperature or the voltage by 50% - so 1.8V and 60C - and leave everything else equal, the increased 1.8V voltage will have a vastly higher impact on long-term reliability of the CPU than the temperature. As long as you keep the CPU temperature lower than the maximum temperature of the design (usually higher than 70 Celsius, often above 100 Celsius), the long-term reliability impact of a small percentage increased heat is minimal. Even a small amount of increased voltage on the other hand, can have a profound impact on long-term reliability.
To drop into the more esoteric discussion of why this is the case, let's start with what causes failures. There are numerous failure mechanisms that cause CPU's to fail over time. Among these, the most common are:
Electromigration (EM):
http://en.wikipedia.org/wiki/Electromigration
Hot-electron gate ("hot-e"):
http://siliconfareast.com/hotcarriers.htm
Time dependent dielectric breakdown (TDDB):
http://siliconfareast.com/oxidebreakdown2.htm
Bond/solder failures (including fatique failures):
http://siliconfareast.com/relmodels3.htm
Bias Temperature Instability (BTI):
http://cobweb.ecn.purdue.edu/~...tutorial-nbti-alam.pdf
There are others, but these are the common ones nowadays. There's a good list at the siliconfareast.com site that I listed above.
This subject is a complex one, but one thing that you can quickly pick up by glancing over the links above is that temperature is not a big lever in causing many of these problems - but voltage is. The equations for failure in hot-E, TDDB and BTI don't include temperature at all - or if it is, it's a 2nd order effect, while voltage is a huge lever - often the square of the voltage is an input. Hot-E electron failures are actually worse at
lower temperatures than higher.
Which issue of the list above is likely to kill a given CPU depends on the process technology of the company that fabricated the CPU and the microprocessor's circuit and layout design. For one CPU, the most common failure mechanism might be electromigration, for another, the interconnect process used in the CPU manufacturing process might be thicker, or contain more copper atoms, and so it might be something else - like TDDB.
Also, as several posters mentioned above, voltage has a huge impact on temperature. The simplified formula for the dynamic power of a CPU is P = Cf(V^2) - where P is the CPU power, C is the on-die capacitance that needs to be switched, f is the frequency of the clock, and V is voltage. Note that it's the square of the voltage that is calculated in... so increasing the voltage just a little raises the power of the CPU (and thus the temperature all things being equal) by the square of the voltage, but increasing the frequency only has a linear effect. Voltage also has a large impact on static power (ie. leakage).
So even for something like electromigration - which is dependent on the current density (which is dependent on the voltage), as well as temperature, then increasing the voltage will both increase the temperature (due to increased power) and the current density - while increasing the temperature only increases the temperature.
Above a certain temperature, however, some of the organic compounds used in the manufacturing of the CPU start to break down. For example, the polyimide layer used in passivation starts to breakdown between 110 and 135 Celsius. So if you start to get above 125C, you will essentially start to "burn up" the CPU. I don't know what the breakdown temperature of the resin underfilled used on BGA packages is, but based on my knowledge of the composition, that should have breakdown temperature between 120 and 150 Celsius. The OLGA packages also will have a fairly "low" breakdown temperature (relative to the silicon anyway).
As far as specific examples involving CPU's like a Cedarmill running at 1.9V. I find it extremely hard to believe that someone could run a 65nm microprocessor at 1.9V for more than a couple of months continuous use. The mean-time-to-failure (MTTF) on a 65nm microprocessor at 1.9V should be extremely short based on my experience.
Patrick Mahoney
Senior Design Engineer
Intel Corp.