Why do we care so much about heat?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

OCGuy

Lifer
Jul 12, 2000
27,224
37
91
Originally posted by: yh125d
Of course I could, but that's not the point. I'm the kind of guy who is also seriously thinking about switching all the 120mm fans in my case with 200cfm replacements, for my rig which has pretty modest heat generation

OCD much?
 

StinkyPinky

Diamond Member
Jul 6, 2002
6,986
1,283
126
Personally I think people do get a bit anal over CPU temps. People tend to panic when they reach 60C which for a CPU is nothing to worry about.
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,117
3,640
126
Originally posted by: dmens

if you're gonna take that kind of attitude, use the correct terminology. fault tolerance is something quite different to what you are describing. oh, and the temp sensors on IC's are usually completely inaccurate when measuring localized hot spots which tend to be the locations of speed limiters. so even for what you are describing, your generalization has little meaning.

if you view that as a challenge that's fine, but for the record, you don't know how long i've been overclocking, and overclocking experience doesn't matter a damn when discussing IC reliability and failure modes.

excuse me.. but you came at me with the attitude first. Telling me in flat out wrong, yet am i?

second, why is it that a GPU can run at 100C yet a CPU cant?

Third, read the title. It says why do we care about heat. Why do you need an aftermarket heat sink?

Maybe because your overclocking?

Originally posted by: dmens

really? that's just flat out wrong.

me personally, i dont give a crap about temps, only voltages.

all i asked was a correction on your statement, yet too proud to admit im correct?
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
firstly, you weren't wrong, whoever told you so was wrong. you're just forwarding incorrect information. and it's still wrong.

secondly, a CPU can run at and over 100C. shit i did that today on a part that had its substrate shaved down to 50um and even with shitastic heat dissipation, it was still OK.

thirdly, you said lifetime is reduced by heat and i was commenting on that. we all know cooler temps result in better overclocks.
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
21,117
3,640
126
Originally posted by: dmens
firstly, you weren't wrong, whoever told you so was wrong. you're just forwarding incorrect information. and it's still wrong.
.

okey if im wrong show me please.

If you have adaquit proof i'll be more the happy to applogize to you.

I personally hate spreading mis information, but as i said, i am taking the direct context of what i heard from an intel engineer.

Originally posted by: dmens
secondly, a CPU can run at and over 100C. shit i did that today on a part that had its substrate shaved down to 50um and even with shitastic heat dissipation, it was still OK.

thermal shutdown... been there, it cant run @ 100C.
Well im talking about the typical cpu on a regular board.

Originally posted by: dmens

yeah except im talking about vias not gates and since current density is an inverse exponential relationship, that approximation is way too simplistic. current and temperature are related properties and it is exceedingly difficult to isolate either one.

Elaborate on this... if you can show me i will admit im wrong.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,361
16,194
136
dmens, I think we all know that less heat=longer CPU life AND better OC. Why are you arguing that ?
 

NXIL

Senior member
Apr 14, 2005
774
0
0
Originally posted by: aigomorla Also i remember an intel engineer telling me each 10C you lower your cpu temps, you just effectively doubled its life. really? that's just flat out wrong.

firstly, you weren't wrong, whoever told you so was wrong. you're just forwarding incorrect information. and it's still wrong.

No, actually, aigomorla (and the Intel Engineer) are correct.

Heat and longevity relate to this:

http://en.wikipedia.org/wiki/Arrhenius_equation

From that comes:

http://electronicdesign.com/Ar...cleID/16767/16767.html

As for fault tolerance: the CPU has to execute threads perfectly, or the entire system crashes.

A GPU can get away with some sloppy processing, and show glitches/color errors/pixel errors on the display without crashing the system....all that bogus data is flushed out rapidly and replaced. And overheated CPU can't tolerate that sort of corrupted data. Your eyes can, as can GPU, up to the temperature where it just can't process any more.

Edit: apologies to idontcare, who clearly discussed heat/longevity much better than I did...

As for: "A cpu can run at over 100 degrees C".....not the typical CPU.

There are three CPUs in this list that can run at 100C: the rest max out at about 60-70C or so.

http://www.hardwaresecrets.com/article/143/5

Also: CPU substrate: isn't that the material that the actual CPU and pins are affixed to? (Sub: under, strate: layer, strata.) a 50 micrometer thin substrate: you could probably see through that....would be pretty darn thin. A human hair is approximately 50 um....depending on your species:

According to The Physics Factbook, the diameter of human hair ranges from 17 to 181 µm.[4] It varies slightly with ethnicity, where Europeans generally have 57-90 µm and Asians around 120 µm.[5]

Originally posted by: dmens yeah except im talking about vias not gates and since current density is an inverse exponential relationship, that approximation is way too simplistic. current and temperature are related properties and it is exceedingly difficult to isolate either one.

? Chewbacca defense?

Chewbacca Defense (plural Chewbacca Defenses)

1. (law, satire, South Park) A satirical term for any legal strategy or propaganda strategy that seeks to overwhelm its audience with nonsensical arguments, as a way of confusing the audience and drowning out legitimate opposing arguments.

Current, voltage, resistance, temperature: difficult, but doable: that is why there are p chem aka physical chemistry classes: this is what Intel engineers do....that is why the reduction in manufacturing processes from 180nm to 90 to 45 has such an effect on temps, efficiency, speed, etc.....

HTH

 

spinejam

Diamond Member
Feb 17, 2005
3,503
1
81
? Chewbacca defense?

Chewbacca Defense (plural Chewbacca Defenses)

1. (law, satire, South Park) A satirical term for any legal strategy or propaganda strategy that seeks to overwhelm its audience with nonsensical arguments, as a way of confusing the audience and drowning out legitimate opposing arguments.




no misunderestimating this!

GWB :)
 

edplayer

Platinum Member
Sep 13, 2002
2,186
0
0
Originally posted by: ShawnD1
Shouldn't an AMD or Intel processor be able to withstand the same abuse as a video card?


Why not let yours run at 82° C and then make a thread about it?
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Originally posted by: NXIL
? Chewbacca defense?

heh why don't you let IDC talk about vias if you don't know what it is?

why are you talking about hairs now? the substrate im talking about is the bulk, and on a commercial package it is a lot thicker than 50um. it is polished down to that thickness for optical observation.

anyways, my angle is that failures on wires are more likely than devices, and afaik that is related to current, which is e to the inverse kT. if IDC tells me electromigration also adheres to that rule, i stand corrected. then he can PM me with a detailed technical explanation. :)

as for thermal shutdown, that's an external trigger. there's nothing preventing the CPU from running at that temp. i guess my definition of working is different from yours lol.
 

NXIL

Senior member
Apr 14, 2005
774
0
0
heh why don't you let IDC talk about vias if you don't know what it is?

vias on the motherboard, or vias in the integrated circuit itself?

Did you modify the vias on your motherboard to 50um tolerances, at home, by hand?

anyways, my angle is that failures on wires are more likely than devices,

Failures on wires....all righty then.



 

poohbear

Platinum Member
Mar 11, 2003
2,284
5
81
Originally posted by: aigomorla

Also i remember an intel engineer telling me each 10C you lower your cpu temps, you just effectively doubled its life.

u mean instead of lasting 5 years it'll last me 10 years? great, so in 2119 when applications need 32 cores & 8 ghz+ cpus minimum, i'll have my trusty dual core 8400 handy. im so stoked.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: dmens
Originally posted by: NXIL
? Chewbacca defense?

heh why don't you let IDC talk about vias if you don't know what it is?

why are you talking about hairs now? the substrate im talking about is the bulk, and on a commercial package it is a lot thicker than 50um. it is polished down to that thickness for optical observation.

anyways, my angle is that failures on wires are more likely than devices, and afaik that is related to current, which is e to the inverse kT. if IDC tells me electromigration also adheres to that rule, i stand corrected. then he can PM me with a detailed technical explanation. :)

as for thermal shutdown, that's an external trigger. there's nothing preventing the CPU from running at that temp. i guess my definition of working is different from yours lol.

There is nothing proprietary about the physics of EM so we can discuss openly for the benefit of the lurkers as well.

The interactions of temperature and current density are explicitly separable and independently measurable.

Here is an adequate online article covering the basics of the physics of EM: http://www.edadesignline.com/s...ml?articleID=192200480

The relevant part of the article as it pertains to the discussion in this thread is the section on Black's Law:

In the late sixties, Jim Black of Motorola was heavily involved in understanding the "cracked stripe" problem that was later identified as electromigration. Jim's pioneering work included the first careful systematic investigations of electromigration failure kinetics. His experiments uncovered the curious behavior that electromigration failures followed kinetics that depended not on the inverse of the current density, but on the inverse square.

t50 = A*(j^-2)*e^(DH/(kT))

where t50 is the median time to failure in an ensemble of samples, A is a constant that needs to be empirically determined and DH is the activation energy for failure. The experimental values found for the activation energy suggested grain boundary diffusion as the mass transport mechanism. For nucleation dominated failure, this equation has proven to be adequate even to the present day. Only small corrections, often too small to be detected experimentally have been needed to keep Black's Law consistent with the latest theoretical developments.

Additional references: http://www.physics.umd.edu/mfu.../publications/KB05.pdf
The rate of electromigration depends on both the electric field and the temperature, depending on electric field as a power law, and temperature exponentially (since electromigration is electric field-biased motion of thermally-activated atoms).

The median lifetime of a device which fails from electromigration is proportional to the inverse of the square of the current density (this is a power-law and is where voltage enters the equation) and the exponential of the temperature (this is the Arrhenius equation part).

Both increasing voltages and increasing temps are non-linearly deleterious on the operating lifetime of any integrated circuit. (CPU or GPU)

Why a GPU could operate at higher temps and/or higher current density is simply a matter of design (engineering). It was designed to have enough robustness in signal integrity and electromigration (which is but one of many failure mechanisms, TDDB for example is another) as to properly operate at 100C for a specified expected lifetime. This is not a special attribute of the GPU...this attribute could be engineered into any IC.

For whatever reasons chosen by TSMC engineers as well as Nvidia layout engineers, the combined effects of layout and process margin appears (per the OP) to have resulted in their GPU's functioning correctly at higher operating temps for the clockspeeds and Vcc's used. (I make this statement assuming the OP's claims are correct, I have not scrutinized the opening statement for validity at this time)

Intel could have done similar, but it would have come with trade-offs in terms of die-size and operating clockspeed at any given Vcc.
 

NXIL

Senior member
Apr 14, 2005
774
0
0
Idontcare wrote:

Both increasing voltages and increasing temps are non-linearly deleterious on the operating lifetime of any integrated circuit. (CPU or GPU)

QFT.

Uh oh, maybe GPU maker nVidia should haved checked with some of those misinformationed Intel engineers about their 60 and 55 nm parts:

The defective parts appear to make up the entire line-up of Nvidia parts on 65nm and 55nm processes, no exceptions. The question is not whether or not these parts are defective, it is simply the failure rates of each line, with field reports on specific parts hitting up to 40 per cent early life failures.

The end result of the failures is that bumps crack between the bump and the substrate on a chip, not on the bump to die side. When this happens to a signal bump, game over for the GPU or MCP.

Once again, if you did your engineering right, this won't happen in any timeframe that matters to mere humans, if it takes ten years of on and off switching to make it happen, once a day power cycling won't matter in our lifetimes. Chip makers tend to engineer for timelines like the ten-year horizon, and are pretty safe in assuming it will live for five years of casual use.


You can continue reading here:

http://www.theinquirer.net/inq...nvidia-chips-defective

nVidia has set aside about $200 million to cover replacement GPUs....it's going to be even more costly than that by the time all these chips are replaced. They are failing in laptops first because of the increased number of hot/cold cycles.

http://www.nvidia.com/object/io_1218573794316.html

dmens, since you don't think heat is an issue, you might want to send a contribution to nVidia to help cover their costs to:

Jen-Hsun Huang
2701 San Tomas Expressway
Santa Clara, CA 95050

Give till it hurts.

 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
bumps on the die side? yeah, that applies on a flip chip package. plus we're talking about internal wires in the IC, not external.

go back to school.

i never said heat does not reduce operational lifetime, that is obvious. only that it's not as drastic as say, half the lifetime every 10 degrees.
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
I've been out of QRE for a while now (since 1997), but I thought that EM (ElectroMigration) stopped being the dominant failure mechanism on CMOS when the world switched to copper - and aren't even vias made with copper nowadays too? I remember tungsten plugs in 0.13um, but I thought everything is copper nowadays. I thought the big problems nowadays are caused by PMOS NBTI (p-type Metal Oxide Semiconductor Negative Bias Temperature Instability) - which has a very strange relationship with temperature as I recall (and as described in the name). Or are interface gate/channel charge traps no longer a concern with the move to High-K dielectrics and EM is back to being the problem with shrinking wire dimensions?



* not speaking for Intel Corp.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: pm
I've been out of QRE for a while now (since 1997), but I thought that EM (ElectroMigration) stopped being the dominant failure mechanism on CMOS when the world switched to copper - and aren't even vias made with copper nowadays too? I remember tungsten plugs in 0.13um, but I thought everything is copper nowadays. I thought the big problems nowadays are caused by PMOS NBTI (p-type Metal Oxide Semiconductor Negative Bias Temperature Instability) - which has a very strange relationship with temperature as I recall (and as described in the name). Or are interface gate/channel charge traps no longer a concern with the move to High-K dielectrics and EM is back to being the problem with shrinking wire dimensions?



* not speaking for Intel Corp.

Until very recently even copper BEOL IC's were finished off with a packaging level of aluminum interconnect as the interface to packaging. The so-called "top level metal". Some process flows even held two aluminum levels at the top. So even in the era of copper BEOL, aluminum EM still required the usual engineering vanguards to be in place.

Electromigration in copper occurs but is mitigated by a metal barrier, usually Ta based or TiN based. (and many, including TI and AMD used a bi-layer approach of Ta on TaN).

Because of the tradeoffs in barrier thickness versus conductivity (every nm of Ta barrier is one less nm of cross-section available for copper, increasing line resistance) the process development cycle for optimizing barrier thickness is one in which lifetime/reliability of the copper EM is balanced so as to not be THE rate limiter on overall IC lifetime while at the same time not being so overly conservative that the metal pitch ends up being oversized as to impair product performance and cost structure.

Is EM an issue? Yes. Is NBTI an issue? Yes. Is TDDB an issue? Yes. All are resourced during development such that none is a fatal issue precluding the shipment of product, but none are so overly resourced as to be engineered out of existence for simple fact that doing so would require not just more resources but also come at the expense of performance and cost of the entire process technology.

(And yes tungsten is still used for plugs up thru 32nm, but dual-damascene copper metal level 1 with copper plugs to S/D is expected to be adopted at 22nm)
 

Gillbot

Lifer
Jan 11, 2001
28,830
17
81
Ok, after reading a handful of IDC's posts, I'm tempted to move this into Highly Technical.

I'm off to take a tylenol now.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: magreen
See, I told you it's a great thread!

I don't care what you say, you still aren't dragging me into that Video forum :laugh:

Every time I peek in there I feel like I've experienced a little death, and no I don't mean the satisfying French la petite mort kind of death.
 

magreen

Golden Member
Dec 27, 2006
1,309
1
81
Originally posted by: Idontcare
Originally posted by: magreen
See, I told you it's a great thread!

I don't care what you say, you still aren't dragging me into that Video forum :laugh:

Every time I peek in there I feel like I've experienced a little death, and no I don't mean the satisfying French la petite mort kind of death.
LOL. You have to stay in this forum for that kind of satisfaction. ;)