- Jul 3, 2008
- 203
- 0
- 0
I have a server (no overclocking), a few years old, with an E8400.  I just had to move it from one place to another and now for the past day and a half I find it shutting itself down randomly.  I do some diagnostics, there's machine check exceptions regarding thermal events, so I install lm-sensors and check the temps.  It's idling in the high-70s, mid-80s.  Under load it's spiking to 100C.  I reboot and open the BIOS health status screen to be sure that I am reading the right temps.  The moment I open it, the CPU temp is 100C, and climbing fast.  Before I can even hit the power button, it is at 112C and then shuts itself down.
I am about to go ahead and reseat the HSF with a fresh coating of Arctic Silver. But should I just go ahead and replace the CPU since it is already 3 years old and I may have cut its lifespan terribly?
	
	
	
		
			
			I am about to go ahead and reseat the HSF with a fresh coating of Arctic Silver. But should I just go ahead and replace the CPU since it is already 3 years old and I may have cut its lifespan terribly?
		Code:
	
	IDLE:
it8718-isa-0290
Adapter: ISA adapter
in0:          +1.06 V  (min =  +0.00 V, max =  +4.08 V)
in1:          +2.03 V  (min =  +0.00 V, max =  +4.08 V)
in2:          +3.30 V  (min =  +0.00 V, max =  +4.08 V)
+5V:          +2.85 V  (min =  +0.00 V, max =  +4.08 V)
in4:          +4.08 V  (min =  +0.00 V, max =  +4.08 V)  ALARM
in5:          +0.05 V  (min =  +0.00 V, max =  +4.08 V)
in6:          +4.08 V  (min =  +0.00 V, max =  +4.08 V)  ALARM
in7:          +3.01 V  (min =  +0.00 V, max =  +4.08 V)
Vbat:         +3.07 V  
fan1:        1717 RPM  (min =    0 RPM)
fan2:           0 RPM  (min =    0 RPM)
temp1:        +40.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp2:        +67.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermal diode
temp3:         -2.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +78.0°C  (high = +78.0°C, crit = +100.0°C)  ALARM (CRIT)
coretemp-isa-0001
Adapter: ISA adapter
Core 1:       +77.0°C  (high = +78.0°C, crit = +100.0°C)  ALARM (CRIT)
LOAD:
it8718-isa-0290
Adapter: ISA adapter
in0:          +1.09 V  (min =  +0.00 V, max =  +4.08 V)
in1:          +2.03 V  (min =  +0.00 V, max =  +4.08 V)
in2:          +3.28 V  (min =  +0.00 V, max =  +4.08 V)
+5V:          +2.85 V  (min =  +0.00 V, max =  +4.08 V)
in4:          +4.08 V  (min =  +0.00 V, max =  +4.08 V)  ALARM
in5:          +0.14 V  (min =  +0.00 V, max =  +4.08 V)
in6:          +4.08 V  (min =  +0.00 V, max =  +4.08 V)  ALARM
in7:          +3.01 V  (min =  +0.00 V, max =  +4.08 V)
Vbat:         +3.07 V  
fan1:        1708 RPM  (min =    0 RPM)
fan2:           0 RPM  (min =    0 RPM)
temp1:        +39.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
temp2:        +88.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermal diode
temp3:         -2.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
coretemp-isa-0000
Adapter: ISA adapter
Core 0:      +100.0°C  (high = +78.0°C, crit = +100.0°C)  ALARM (CRIT)
coretemp-isa-0001
Adapter: ISA adapter
Core 1:       +98.0°C  (high = +78.0°C, crit = +100.0°C)  ALARM (CRIT)
LOG:
kernel: [ 1499.816014] [Hardware Error]: Machine check events logged
mcelog: HARDWARE ERROR. This is *NOT* a software problem!
mcelog: Please contact your hardware vendor
mcelog: MCE 0
mcelog: CPU 0 THERMAL EVENT TSC 373bbee5e1a 
mcelog: TIME 1307127492 Fri Jun  3 14:58:12 2011
mcelog: Processor 0 heated above trip temperature. Throttling enabled.
mcelog: Please check your system cooling. Performance will be impacted
mcelog: STATUS 88010023 MCGSTATUS 0
mcelog: MCGCAP 806 APICID 0 SOCKETID 0 
mcelog: CPUID Vendor Intel Family 6 Model 23
mcelog: HARDWARE ERROR. This is *NOT* a software problem!
mcelog: Please contact your hardware vendor
mcelog: MCE 1
mcelog: CPU 0 THERMAL EVENT TSC 373bc002b4d 
mcelog: TIME 1307127492 Fri Jun  3 14:58:12 2011
mcelog: Processor 0 below trip temperature. Throttling disabled
mcelog: STATUS 88010022 MCGSTATUS 0
mcelog: MCGCAP 806 APICID 0 SOCKETID 0 
mcelog: CPUID Vendor Intel Family 6 Model 23
kernel: [ 1616.571112] i2c /dev entries driver
kernel: [ 1752.243219] CPU0: Core temperature above threshold, cpu clock throttled (total events = 93581)
kernel: [ 1752.243609] CPU0: Core temperature/speed normal 
				
		 
			 
 
		 
 
		 
 
		 
 
		 
 
		 
 
		 
 
		 
 
		
 Facebook
Facebook Twitter
Twitter