Why is AMD CPU's max temp less than Intels?

Maximilian

Lifer
Feb 8, 2004
12,604
15
81
Is it just because of the design/process etc it cant tolerate heat as well?

Or is it because AMD use a different method of measuring temperature and the temp threshold is actually physically similar to Intels?
 

LoneNinja

Senior member
Jan 5, 2009
825
0
0
Unless I'm mistaken the location of their temp sensors and how they work do vary compared to Intel. They have completely different designs for architecture, and the manufacturing process is different too, even when comparing 32nm to 32nm.
 

frostedflakes

Diamond Member
Mar 1, 2005
7,925
1
81
It might have something to do with them operating at higher voltages than Intel chips. Since they run at higher volts, AMD might need to keep maximum operating temp lower to achieve similar lifetime.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
It all comes down to (1) the physics that underlie reliability and product lifetime, and (2) the physics that underlie the power-consumption of the CPU.

1. Physics that underlie reliability and product lifetime:

Thermally activated processes (remember the Arrhenius equation from your p-chem classes) will approximately double in rate for every 10°C higher the operating temp.

A cpu at 70°C will have roughly twice the operating lifetime as a cpu operated at 80°C.

During the technology node development phase, getting the intrinsic reliability of operating lifetime of the process technology is a major challenge. Our here in the public domain you mostly hear about yields, but reliability is an equally challenging issue during those 4yrs in which the node is originally developed.

Well, what can you do if you are running out of R&D time, your node needs to be put into production asap, but the one thing holding it back is that in its current form (hypothetically speaking) your Lifetime Reliability department is telling you that the chips will die in about 12 months of they are operated at 100°C?

The easiest thing to do is to just limit the max upper-temp to a value that enables you to hit your reliability target. 12 months at 100°C is too little? No problem, at 90°C that 12 months becomes a 2 years, at 80°C it becomes 4 yrs, at 70°C it becomes 8 yrs.

Now if your customers can be convinced to restrict their operating environment such that the CPU doesn't exceed 70°C, and an 8yr expected lifetime conforms with your warranty model and internal targets (10yrs is actually the norm for the industry), then you could just go to production and not worry about the extra 6-8 months it would have taken you to get the intrinsic reliability of your process technology up to the point that you could expect an 8-10yr operating life at 100°C.

This is where/how Intel decides what TJMax is going to be. 98°C is based on how operating temps effect operating life, if 97°C was needed then that is what they would have spec'ed it at, or 99°C, etc.

By the way, it is this "reliability margin" that we OC'ers are using up and using to our advantage when overclocking.

A chip that is built with a process tech that can intrinsically support an estimated lifetime of 10yrs at 98°C means you get 20yrs at 88°C, or 40yrs at 78°C, or 80yrs at 68°C.

Well none of us OC'ers really care to have our 2600K's last 40yrs or 80yrs, so we in turn up our operating voltages (increasing voltage, any voltage, decrease lifespan of the CPU) to such an extent that we basically spend our reliability budget on increasing the Vcc.

"Hot Carrier Damage" kills the transistors by degrading them over time. This happens at any voltage, but more voltage (and more current) makes it happen even faster.

Checkout this short article for a very very cursory review of some of the issues at play in device reliability (TDDB, Hot Carrier Damage, Electromigration)

So that chip which could be expected to last 40yrs at 78°C with 1.3V might only last 10yrs at 78°C at 1.4V, and only 2.5yrs at 1.5V and 78°C.

Increase voltage AND keep your operating temps high(er) is just using up your reliability budget all the faster.

This is why OC'ers pursue lower operating temps (although they may not know why they are doing such)...its not just about the power-bill or the noise...and it is part of the reason why extreme OC'ers go to vapor-phase, LN2, and LHe temperatures before putting 1.8V and so on through their CPU's.

But at the end of the day the reason for the max temperature difference between AMD and Intel CPU's comes down to the intrinsic reliability of those CPU's with respect to thermally activated degredation mechanisms.

Intel spent more money/time/effort to improve the intrinsic reliability of their process integration at those nodes respectively.

2. Physics that underlie the power-consumption of the CPU:

The second reason why the temperature maximum need be specified is adherence to the TDP spec. Power consumption from leakage in the silicon device is fundamentally dependent on the operating temperature of the CPU.

PtotalVccTGHz.png


All else held constant (Vcc, clockspeed, etc), a hotter CPU will consume more power than a cooler one (see this thread).

TempvsPowerfor2GHzat1290V.png


So if AMD wants to spec their CPU's as being 125W TDP, for example, at a give clockspeed and operating voltage, if the max temp is 100°C then they have to dial down the clockspeed and/or Vcc such that when the CPU is at 100C it is not violating its own TDP spec because of the elevated leakage current.

My 2600K has a TDP of 95W, at 98°C the chip is burning through about 45W of just leakage (static power consumption) losses alone, but if I drop the operating temps to 68°C then the leakage power drops to 30W. That gives me a 15W TDP "surplus" to raise my clockspeeds while not exceeding the spec'ed TDP.

So AMD, by lowering their max allowed operating temperature spec, makes it easier on themselves to hit their reliability spec's as well as making it easier to bin their chips for higher clockspeeds and/or Vcc's while fitting them into the desired TDP bins.
 

Maximilian

Lifer
Feb 8, 2004
12,604
15
81
Wow very informative post IDC, im glad you're on these forums :thumbsup: Think ill have to read it again for it to sink in though.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Wow very informative post IDC, im glad you're on these forums :thumbsup: Think ill have to read it again for it to sink in though.

My pleasure. But I hope you understand that in my mind the conversation is only halfway done because you haven't asked the follow-up question: "Should we care that AMD CPU's max temp is less than Intels?"

And in mt humble opinion the answer to that question, strictly speaking from the laymen consumer enthusiast perspective and side of the equation, is "no, it really doesn't amount to a hill of beans as far as we should care".

Do AMD OC'ers and enthusiasts really care whether or not their Thubans still function in the year 2022? Server and HPC guys will, but they aren't going to jeopardize their lifetime reliability by overclocking and/or overheating their CPU's anyways.

But I am fairly confident that 99.9% of thubans in the consumer marketplace today will not be in use in 2022. So who cares?

And I suspect it is this acknowledgement of reality that is behind AMD's willingness to spec a lower max operating temp in the first place. The engineering overhead built into Intel's CPU's may simply be a needless expense in every facet save for the few chips that will make it into a nuclear sub or something where it needs to function for 15-40yrs.
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
233
106
IDC,

There are a few Thubans rated @ 71C. Why?

You take a 1055T 125W 62C and Vcore is similar as for 1035T. Yet, 9C difference. Logically? I see no sense. Isn't it the same silicon or is just bad marketing (OEM)?

 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
IDC,

There are a few Thubans rated @ 71C. Why?

You take a 1055T 125W 62C and Vcore is similar as for 1035T. Yet, 9C difference. Logically? I see no sense. Isn't it the same silicon or is just bad marketing (OEM)?

Not bad marketing, it all comes down to binning.

If you rate a CPU as being capable of operating at a max temp of 71C then that means at 71C it must still (1) not exceed it spec'ed TDP, and (2) must meet an internal spec for reliability purposes.

Lowering this threshold 9C opens up the binning to allow for a lot more silicon that would otherwise not be able to meet either the TDP or the reliability spec were it operated at the higher temperature of 71C.
 

RavenSEAL

Diamond Member
Jan 4, 2010
8,661
3
0
IDC strikes again, great read. As someone said though, gonna have to let it sink in a bit o_O
 

386DX

Member
Feb 11, 2010
197
0
0
Unless I'm mistaken the location of their temp sensors and how they work do vary compared to Intel. They have completely different designs for architecture, and the manufacturing process is different too, even when comparing 32nm to 32nm.

You are correct that the location of the temperature diode is different. Intel has a temp diode in each core of there CPU, AMD has a single temp diode around approximately where the Northbridge is.

Here's some information relavent to this topic from the Core Temp website that explains the different.
http://www.alcpu.com/forums/viewtopic.php?f=63&t=892

I can only see a single temperature reading.
AMD processors based on the Phenom and Phenom II (Athlon II, Sempron II, Turion II, etc.) only have a single thermal sensor.
Thus Core Temp will only display a single CPU temperature reading. There is no way of getting a per-core reading on these processors.

Why is the temperature of my Phenom based processor lower than the ambient temperature?
Starting with the Phenoms, AMD's digital sensor no longer reports an absolute temperature value anymore, but a reading with a certain offset, which is unknown. It is estimated that this offset is between 10 - 15c.

The last bold is the important one. As much as the AMDers keep insisting that there CPU runs super cool... even below ambient that's really not the case because the temperate reading they are getting isn't the "real" temperature. Because AMD doesn't provide the TjMax value for there processors you can not get the proper offset for the temperature reading hence you can not get the "real" temperature.
 

KingFatty

Diamond Member
Dec 29, 2010
3,034
1
81
Has anyone intentionally tried to exceed the expected lifetime of a CPU? Can you overclock and overvolt then see it's reduced lifetime?

I only ask because I remember a post where some guys were intentionally writing terabytes to SSD drives to attempt to get them to exceed their lifetimes. Turned out the SSDs lasted way way longer than expected, perhaps due to over-engineering or whatever. Maybe the CPUs would be similar, where you can't really break them anymore (within a reasonable lifetime)?
 

frostedflakes

Diamond Member
Mar 1, 2005
7,925
1
81
You are correct that the location of the temperature diode is different. Intel has a temp diode in each core of there CPU, AMD has a single temp diode around approximately where the Northbridge is.

Here's some information relavent to this topic from the Core Temp website that explains the different.
http://www.alcpu.com/forums/viewtopic.php?f=63&t=892

I can only see a single temperature reading.
AMD processors based on the Phenom and Phenom II (Athlon II, Sempron II, Turion II, etc.) only have a single thermal sensor.
Thus Core Temp will only display a single CPU temperature reading. There is no way of getting a per-core reading on these processors.

Why is the temperature of my Phenom based processor lower than the ambient temperature?
Starting with the Phenoms, AMD's digital sensor no longer reports an absolute temperature value anymore, but a reading with a certain offset, which is unknown. It is estimated that this offset is between 10 - 15c.

The last bold is the important one. As much as the AMDers keep insisting that there CPU runs super cool... even below ambient that's really not the case because the temperate reading they are getting isn't the "real" temperature. Because AMD doesn't provide the TjMax value for there processors you can not get the proper offset for the temperature reading hence you can not get the "real" temperature.
Are you sure about that? My Phenom II reported per-core temperature in HWMonitor and other programs. Perhaps they use a single sensor and some sort of signal processing or other technique to estimate the temps in each core, though.
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
233
106
Not bad marketing, it all comes down to binning.

If you rate a CPU as being capable of operating at a max temp of 71C then that means at 71C it must still (1) not exceed it spec'ed TDP, and (2) must meet an internal spec for reliability purposes.

Lowering this threshold 9C opens up the binning to allow for a lot more silicon that would otherwise not be able to meet either the TDP or the reliability spec were it operated at the higher temperature of 71C.
Say, I underclock 1090T to 1035Ts clock, is it safe to assume it would safely run @ 71c then?

And if I further drop the clock, even at 100c?

As long as it doesnt exceed TDP?

Is there an extremely simplified formula?
 
Last edited:

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Say, I underclock 1090T to 1035Ts clock, is it safe to assume it would safely run @ 71c then?

And if I further drop the clock, even at 100c?

As long as it doesnt exceed TDP?

Is there an extremely simplified formula?

Not trying to be obtuse, but it totally depends on what you mean when you say "safely".

Exceeding TDP itself is not the issue, its the combination of operating voltage, current, and temperature.

Exceeding TDP is a contractual (spec) issue. If AMD or Intel says their chip has a TDP of 125W but it actually has an effective TDP of 140W then they've got class-action lawsuit potential on their hands. That is why they have to be mindful of the effect of temperature on power-consumption and TDP.

There is no simplified formula. There could be, but in order to make the simplifications we would need to know a bunch of stuff that AMD and Intel know but are not about to put into the public domain.

At a fundamental device physic level, if you want to be able to operate your CPU at higher temperatures without compromising its intrinsic lifetime reliability you will have to markedly reduce the operating voltage and clockspeed. (clockspeed determines current, as does voltage, that is why clockspeed is relevant to the discussion)

The practical upper limit is likely somewhere around 130-150C with this approach, consider that in the limit of zero voltage and zero clockspeed (i.e. the state of the CPU when it is in the fab still) we routinely expose the CPU to temperatures in excess of 400°C.

So you probably could take your thuban to say 130C provided you could clock it at 100MHz and have it operate with say a scant 0.8V or some such.

Obviously I don't know the exact limits for thuban specifically, but within the physic's induced limits of process technology itself and my own experiences in developing technology nodes, I think these are reasonable guesstimates.
 
  • Like
Reactions: Magic Carpet

bryanW1995

Lifer
May 22, 2007
11,144
32
91
Great post IDC. But in this case, at least part of the reason for the enormous temperature variance is the location of the temperature sensor. As mentioned, there is actually only one sensor for AMD cpus. You can't get a "per-core" temp on AMD cpus, which bugged me to no end when I was overclocking my work computer.

Edit: as a follow-up, is there a low-end threshold? By that I mean, if you take that hypothetical cpu with a 40 yr lifespan at 78c and 1.3v, will it really last 2560 years if you are able to run it at 1.0v and keep the 78c temperature (assuming the other parts last that long of course)? Or do you get a law of diminishing returns as you underclock/undervolte below specs?
 
Last edited:

ehume

Golden Member
Nov 6, 2009
1,511
73
91
Thanks for the information. Like everyone else, I'll be digesting this.

One question: when I got into the OC hobby two years ago, AMD users were commenting that their CPU's began making mistakes when their temps got up into the high 50's. From this I concluded that they and Intel were using fundamentally different technology.

Does anyone else remember AMD chips making errors when they hit 57-60c? Or was I somehow reading the posts incorrectly? I 'll admit that a lot more went over my head then than now.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Has anyone intentionally tried to exceed the expected lifetime of a CPU? Can you overclock and overvolt then see it's reduced lifetime?

I only ask because I remember a post where some guys were intentionally writing terabytes to SSD drives to attempt to get them to exceed their lifetimes. Turned out the SSDs lasted way way longer than expected, perhaps due to over-engineering or whatever. Maybe the CPUs would be similar, where you can't really break them anymore (within a reasonable lifetime)?

Not quite. The lifetimes can be specified such that some percentage of chips fail at end-of-life, so a 10 year lifetime might mean "1% of chips fail after 10 years". For an OEM, that may be a big deal (since they will have an enormous number of returns), but for overclockers, that's probably in the noise of other failures (e.g. bending pins, crushing de-lidded cores, etc). I don't know what the actual specification is for products, but I would expect that most chips/SSDs/cars/Mars rovers/whatever will last well beyond their rated lifetimes. To get meaningful results, you would need to put dozens or hundreds of parts through the same usage patterns - you can't get much from testing one chip.

So you probably could take your thuban to say 130C provided you could clock it at 100MHz and have it operate with say a scant 0.8V or some such.

Obviously I don't know the exact limits for thuban specifically, but within the physic's induced limits of process technology itself and my own experiences in developing technology nodes, I think these are reasonable guesstimates.
You have to be a little careful even when taking any one dimension to extremes. For example, at 0Hz and 0V, at some point temperature will destroy/damage the part (whether it's a mechanical issues, where different coefficients of expansion result in something breaking/cracking, or a silicon issue like stress migration). But you're correct that you can trade the dimensions off against each other within reasonable limits. By the way, excellent posts, IDC.
 

thilanliyan

Lifer
Jun 21, 2005
12,000
2,225
126
All this talk is making me want a Sandy Bridge!!

Sigh...being an AMD fan sucks right now lol. At least the GPU side is not putting out duds.