- Jul 3, 2008
- 203
- 0
- 0
I have a server (no overclocking), a few years old, with an E8400. I just had to move it from one place to another and now for the past day and a half I find it shutting itself down randomly. I do some diagnostics, there's machine check exceptions regarding thermal events, so I install lm-sensors and check the temps. It's idling in the high-70s, mid-80s. Under load it's spiking to 100C. I reboot and open the BIOS health status screen to be sure that I am reading the right temps. The moment I open it, the CPU temp is 100C, and climbing fast. Before I can even hit the power button, it is at 112C and then shuts itself down.
I am about to go ahead and reseat the HSF with a fresh coating of Arctic Silver. But should I just go ahead and replace the CPU since it is already 3 years old and I may have cut its lifespan terribly?
I am about to go ahead and reseat the HSF with a fresh coating of Arctic Silver. But should I just go ahead and replace the CPU since it is already 3 years old and I may have cut its lifespan terribly?
Code:
IDLE:
it8718-isa-0290
Adapter: ISA adapter
in0: +1.06 V (min = +0.00 V, max = +4.08 V)
in1: +2.03 V (min = +0.00 V, max = +4.08 V)
in2: +3.30 V (min = +0.00 V, max = +4.08 V)
+5V: +2.85 V (min = +0.00 V, max = +4.08 V)
in4: +4.08 V (min = +0.00 V, max = +4.08 V) ALARM
in5: +0.05 V (min = +0.00 V, max = +4.08 V)
in6: +4.08 V (min = +0.00 V, max = +4.08 V) ALARM
in7: +3.01 V (min = +0.00 V, max = +4.08 V)
Vbat: +3.07 V
fan1: 1717 RPM (min = 0 RPM)
fan2: 0 RPM (min = 0 RPM)
temp1: +40.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor
temp2: +67.0°C (low = +127.0°C, high = +127.0°C) sensor = thermal diode
temp3: -2.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +78.0°C (high = +78.0°C, crit = +100.0°C) ALARM (CRIT)
coretemp-isa-0001
Adapter: ISA adapter
Core 1: +77.0°C (high = +78.0°C, crit = +100.0°C) ALARM (CRIT)
LOAD:
it8718-isa-0290
Adapter: ISA adapter
in0: +1.09 V (min = +0.00 V, max = +4.08 V)
in1: +2.03 V (min = +0.00 V, max = +4.08 V)
in2: +3.28 V (min = +0.00 V, max = +4.08 V)
+5V: +2.85 V (min = +0.00 V, max = +4.08 V)
in4: +4.08 V (min = +0.00 V, max = +4.08 V) ALARM
in5: +0.14 V (min = +0.00 V, max = +4.08 V)
in6: +4.08 V (min = +0.00 V, max = +4.08 V) ALARM
in7: +3.01 V (min = +0.00 V, max = +4.08 V)
Vbat: +3.07 V
fan1: 1708 RPM (min = 0 RPM)
fan2: 0 RPM (min = 0 RPM)
temp1: +39.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor
temp2: +88.0°C (low = +127.0°C, high = +127.0°C) sensor = thermal diode
temp3: -2.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +100.0°C (high = +78.0°C, crit = +100.0°C) ALARM (CRIT)
coretemp-isa-0001
Adapter: ISA adapter
Core 1: +98.0°C (high = +78.0°C, crit = +100.0°C) ALARM (CRIT)
LOG:
kernel: [ 1499.816014] [Hardware Error]: Machine check events logged
mcelog: HARDWARE ERROR. This is *NOT* a software problem!
mcelog: Please contact your hardware vendor
mcelog: MCE 0
mcelog: CPU 0 THERMAL EVENT TSC 373bbee5e1a
mcelog: TIME 1307127492 Fri Jun 3 14:58:12 2011
mcelog: Processor 0 heated above trip temperature. Throttling enabled.
mcelog: Please check your system cooling. Performance will be impacted
mcelog: STATUS 88010023 MCGSTATUS 0
mcelog: MCGCAP 806 APICID 0 SOCKETID 0
mcelog: CPUID Vendor Intel Family 6 Model 23
mcelog: HARDWARE ERROR. This is *NOT* a software problem!
mcelog: Please contact your hardware vendor
mcelog: MCE 1
mcelog: CPU 0 THERMAL EVENT TSC 373bc002b4d
mcelog: TIME 1307127492 Fri Jun 3 14:58:12 2011
mcelog: Processor 0 below trip temperature. Throttling disabled
mcelog: STATUS 88010022 MCGSTATUS 0
mcelog: MCGCAP 806 APICID 0 SOCKETID 0
mcelog: CPUID Vendor Intel Family 6 Model 23
kernel: [ 1616.571112] i2c /dev entries driver
kernel: [ 1752.243219] CPU0: Core temperature above threshold, cpu clock throttled (total events = 93581)
kernel: [ 1752.243609] CPU0: Core temperature/speed normal