TheRealMrGrey

Member
Jan 20, 2007
125
0
76
So I've owned my current system for about 2 years. I installed my new GPU a few weeks ago (EVGA GTX 560 Ti SC), but the rest of the system is 2 years old.

I've OC'd the i5-750 to 3.04 GHz. I've only gone to 3.04 GHz because my cooler is not up to anything higher. In prime 95 it goes to about 65 C after a few hours at 3.04 GHz, but goes to 71 C at 3.20 GHz - the i5-750 has a max safe temp of 72 C. My cooler is an Arctic Freezer 7 Pro Rev.2, and yeah, not good enough apparently. I originally chose 3.04 GHz to keep the temps away from the limit and it was stable w/ prime95 after 8 hours.

Over these 2 years, I've only had three OC fails where the Bios claims after the fail that the system was reset to default due to overclocking/overvoltage. One of those failures was just now.

After the 1st two failures, I was able to just restart, go into bios and reset, and away I went. Since it has happened so infrequently I figured it was just part of OCing - eventually you will have some fail. However, this time it would not start after rebooting. Well - it would power on, the fans and case light would go on, I could hear it trying to read the DVD drive, but it would not make the *beep* that it does each time on startup. (I think this means it will not "post"?) I figured it might be my GPU, since it is new, and decided to remove it and put in my old GPU (8800 GTS) and the computer started up fine. So I was like "damn, faulty GPU, now I have to RMA." But just to make sure, I put the GTX 560 Ti back in. This time it posted and started up fine.

So now I'm not sure what caused the crash - or why it wouldn't post until I swapped cards, or why the new card suddenly works again after swapping back. I went into the bios and it did tell me that the computer had crashed due to overclocking/overvoltage. But maybe it always says this? Here are the things I've changed in BIOS to OC:

CPU Clock Ratio: 19x
QPI Clock Ratio: x32 (I don't remember whether this is default)
BCLK Freq: 160 MHz
RAM timings: 8/8/8/24 (these are the default timings for my RAM)
DRAM voltage: 1.4 V

I'm allowing the Gigabyte MB to automatically control the CPU voltage, and it is setting it to 1.26 V. I don't really know what the best setting is, so I figured just go with the MB's setting. Also, I've got my DRAM voltage low because I'm using 1600MHz ECO RAM that's made to run at 1.35 V. I'm also allowing the MB to alter the multiplier dynamically, I suspect this saves energy when not in game.

My question is whether there is a sure way to know what caused the crash? Could it be the GPU is sketchy, thus the need to swap it out? Or was it a coincidence and it was just a random CPU fail? Perhaps the MB CPU voltage setting is too high, is 1.26 V too high for my OC?

I have not updated the bios on this MB, mainly because I've read that there is some probability it can permanently kill the board. But maybe I should update the bios?

Edit: I think my MB is Rev 1.
 
Last edited:

ehume

Golden Member
Nov 6, 2009
1,511
73
91
Sometimes I have to re-seat items. My candidate for your problem is the need to reseat the vidcard. Which you did.

There was even a product that helped improve slot seating.
 

Puppies04

Diamond Member
Apr 25, 2011
5,909
17
76
the i5-750 has a max safe temp of 72 C.

Where did you get this number from? TJmax is 95 degrees. That is the point the chip will throttle itself back to prevent damage. I would not reccomend going that high but as far as intel are concerned it won't cause any permanent damage.

After the 1st two failures, I was able to just restart, go into bios and reset, and away I went. Since it has happened so infrequently I figured it was just part of OCing - eventually you will have some fail. However, this time it would not start after rebooting. Well - it would power on, the fans and case light would go on, I could hear it trying to read the DVD drive, but it would not make the *beep* that it does each time on startup. (I think this means it will not "post"?) I figured it might be my GPU, since it is new, and decided to remove it and put in my old GPU (8800 GTS) and the computer started up fine. So I was like "damn, faulty GPU, now I have to RMA." But just to make sure, I put the GTX 560 Ti back in. This time it posted and started up fine.

Nobody is going to pull out a magic wand and be able to tell you exactly what is the problem, however we can make suggestions to try and isolate the fault. Here is what I would do in this order.

1. Remove the overclock, put everthing to stock and check if it boots.

2. Post back here
 

ehume

Golden Member
Nov 6, 2009
1,511
73
91
Where did you get this number from? TJmax is 95 degrees. That is the point the chip will throttle itself back to prevent damage. I would not reccomend going that high but as far as intel are concerned it won't cause any permanent damage.

Intel lists the TDP of the Lynnfield chips at 72.7c as measured at the center of the IHS, what Intel calls the "case" temperature, IIRC. The core temps are about 8-10c hotter. If you have a Gigabyte board, what ET6 reports as the cpu temp is Tcase, or the temp at the ihs. Other mb's do not report this temp, but there is a signal from the cpu for it.

Tjmax, or the maximum junction temp, is 99c for Lynnfields. That is where the chip throttles itself. This would equate to a Tcase of 90c or so. It won't kill the chip for short runs, but Intel tells us not to run it over Tcase = 72.7c.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Intel lists the TDP of the Lynnfield chips at 72.7c as measured at the center of the IHS, what Intel calls the "case" temperature, IIRC. The core temps are about 8-10c hotter. If you have a Gigabyte board, what ET6 reports as the cpu temp is Tcase, or the temp at the ihs. Other mb's do not report this temp, but there is a signal from the cpu for it.

Tjmax, or the maximum junction temp, is 99c for Lynnfields. That is where the chip throttles itself. This would equate to a Tcase of 90c or so. It won't kill the chip for short runs, but Intel tells us not to run it over Tcase = 72.7c.

The damage done to the silicon happens within the silicon at the temperature the silicon reaches. The damage does not happen in the silicon just because some point on space located away from the silicon reaches a setpoint (Tcase).

The TCase temp is spec'ed for purposed of thermal transfer, if your cooling solution cannot keep TCase below 72.7C while the CPU itself is closing in on TJMax then your thermal solution is incapable of meeting Intel's spec. That is what TCase is defined for, the HSF engineers and OEM's.

The damage to the CPU that we all are to be worried about is the damage that comes from thermally enabled processes which occur in the CPU, and that is why TJmax is the value it is and not a number that is higher or lower.

If the silicon in the CPU could not withstand operating at 98C on my 2600K CPU then Intel would have set the TJmax to a lower value.

TCase is a number that basically has to be defined for the purposes of safeguarding the customer of OEM builds from being handed computers that are so anemic on cooling (for cost-savings reasons for the OEM) that the system is virtually guaranteed to throttle.

If the cooling can't keep TCase below the spec'ed TCase max then it is assured that the customer is going to deprived of using the full performance of their CPU because the CPU itself is all the more likely to be hitting TJmax and be throttling.

But in either event, exceeding TCase and/or hitting TJmax, the CPU's throttling is set to its current temperature threshold for all the right reasons.

FWIW, for us enthusiasts, worrying about TCase is pretty much a fool's errand in that you have gotten sidetracked onto something that really doesn't matter for the reasons you are probably thinking it should or would matter.
 

ehume

Golden Member
Nov 6, 2009
1,511
73
91
The damage done to the silicon happens within the silicon at the temperature the silicon reaches. The damage does not happen in the silicon just because some point on space located away from the silicon reaches a setpoint (Tcase).

The TCase temp is spec'ed for purposed of thermal transfer, if your cooling solution cannot keep TCase below 72.7C while the CPU itself is closing in on TJMax then your thermal solution is incapable of meeting Intel's spec. That is what TCase is defined for, the HSF engineers and OEM's.

The damage to the CPU that we all are to be worried about is the damage that comes from thermally enabled processes which occur in the CPU, and that is why TJmax is the value it is and not a number that is higher or lower.

If the silicon in the CPU could not withstand operating at 98C on my 2600K CPU then Intel would have set the TJmax to a lower value.

TCase is a number that basically has to be defined for the purposes of safeguarding the customer of OEM builds from being handed computers that are so anemic on cooling (for cost-savings reasons for the OEM) that the system is virtually guaranteed to throttle.

If the cooling can't keep TCase below the spec'ed TCase max then it is assured that the customer is going to deprived of using the full performance of their CPU because the CPU itself is all the more likely to be hitting TJmax and be throttling.

But in either event, exceeding TCase and/or hitting TJmax, the CPU's throttling is set to its current temperature threshold for all the right reasons.

FWIW, for us enthusiasts, worrying about TCase is pretty much a fool's errand in that you have gotten sidetracked onto something that really doesn't matter for the reasons you are probably thinking it should or would matter.

Yup. Every bit of that, which is why we track our core temps (Tj) when we OC. But Tcase has a steady relation to Tj, so tracking Tcase is useful for those who do not track Tj. In any . . . case (sorry) . . . Tcase is what Intel means when it lists a max TDP of 72.7c.

I believe that having knowledge about what people mean when they say something is important.
 

TheRealMrGrey

Member
Jan 20, 2007
125
0
76
Ok - first off, thanks to you folks for the help and the info regarding the core temps, I was not aware of the difference.

I'm beyond restarting with the system set back to default. The Ga MB I own automatically resets it to default if a failure occurs, so the next time you restart (assuming you don't go into bios) it will be set back to defaults. I did that and it restarted fine after installing my old GPU.

Since last night I have had two instances where the screen went momentarily black and then I received a warning:

"display driver stopped responding and recovered itself"

I'm using Nvidia driver 290.36, which I downloaded from EVGA. Now I found this recent thread where others have had a similar problem and suggested (1) switching to the new 290.53, and (2) RMA the GPU:

http://www.evga.com/forums/tm.aspx?m=1390952&mpage=1

Note that I do not OC my GPU.

Also, after one of these black screen instances the sound stopped working - but worked again after restart. So at the moment I'm in process of installing the new driver to see if this works - will post and let you know. I'm a bit worried it is the GPU now, however, since everything has worked just fine for 4 weeks and suddenly the driver stops working? Unlikely to be software.

Finally, regarding the CPU core temperature, I was using CPUID's Hardware Monitor to obtain the temps in the CPU, so it depends on what Hardware Monitor is actually measuring. It looks like ehume is suggesting that the temp Hardware Monitor receives is the Tcase, and thus my cores are around 75 C when I'm reading 65 C. I agree that I'm not looking to push the thing to the limit, I'm not unhappy with the performance I'm getting at these temps.

Edit: What I'm most uncertain about regarding the CPU is whether the MB is setting the voltage too high at 1.26V. I'm uncertain how you know how high to go.
 
Last edited:

TheRealMrGrey

Member
Jan 20, 2007
125
0
76
Also - just so you folks know I'm with you - I work on experimental transistors in physics and am aware of the physics of how the damage is caused in the silicon. However, I do not work for the computer industry and therefore do not know all the jargon used (ie Tj, Tcase, etc) and where various thermal monitors are placed. So thanks for all this info, it is really helpful.

Regarding actually measuring Tj, if I have a Ga MB and it only reports Tcase, how am I supposed to measure Tj?
 

ehume

Golden Member
Nov 6, 2009
1,511
73
91
Also - just so you folks know I'm with you - I work on experimental transistors in physics and am aware of the physics of how the damage is caused in the silicon. However, I do not work for the computer industry and therefore do not know all the jargon used (ie Tj, Tcase, etc) and where various thermal monitors are placed. So thanks for all this info, it is really helpful.

Regarding actually measuring Tj, if I have a Ga MB and it only reports Tcase, how am I supposed to measure Tj?

I like Real Temp. Others like Core Temp. And there are other apps out there.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Regarding actually measuring Tj, if I have a Ga MB and it only reports Tcase, how am I supposed to measure Tj?

Measuring TCase is extremely difficult in reality. It requires you to carefully mill a divet in the center of your CPU's IHS to host the tip of the thermal probe, as well as a trench to host the thermal probe's wires.

Unless you have gone to these lengths, anything reporting "Tcase" to you is not really measuring nor reporting "Tcase", it is reporting a temperature but it is not Tcase. It might be Tsocket, or Tmobo, etc.

Tcase has a very specific engineering specification associated with it. Unfortunately the term itself gets conflated by diagnostic software and mobo apps, leading to general misunderstanding of what it means to begin with.

See page 52 (and onwards) of the Intel Thermal Guide.

It is rare to see someone who has actually bothered to attempt to measure Tcase correctly in the enthusiast realm. It's really a specification meant for use by engineers who are designing HSF's, laptops, OEM setups, etc. (i.e. technical professionals)
 

TheRealMrGrey

Member
Jan 20, 2007
125
0
76
Ok - after updating the Nvidia drivers I have had no stability issues - so I guess it was the driver. Odd that it suddenly stopped working, though.

Thanks again for all the help.

Edit: Ha! I jinxed myself, no sooner do I write this but I have a driver fail again. Blah. Now going through EVGA support.
 
Last edited: