What really happens to a cpu that locks up...?

Bartman39

Elite Member | For Sale/Trade
Jul 4, 2000
8,867
51
91
What I mean is when a PC locks up because of to high of and overclock (say when running a benchmark or...?) and it just freezes what exactly happens...? And does it damage the processor to any degree...? (meaning if left this way for a period of time... 1min-1hour...?)

Do you get a thermal runaway...? Or does it just shutdown...?

A good semi-technical answer would be great...
thanks...
 

SuperTool

Lifer
Jan 25, 2000
14,000
2
0
Originally posted by: Bartman39
What I mean is when a PC locks up because of to high of and overclock (say when running a benchmark or...?) and it just freezes what exactly happens...? And does it damage the processor to any degree...? (meaning if left this way for a period of time... 1min-1hour...?)

Do you get a thermal runaway...? Or does it just shutdown...?

A good semi-technical answer would be great...
thanks...

Many scenarios are possible. It could get stuck in an infinite loop, or something like that. Basically when you overclock the CPU, certain race conditions that are deterministic under normal conditions could now be different. Like a signal that used to change before clock edge now changes after, and so you get out of step somewhere or get a wrong value. If that value was your loop condition, maybe now you get stuck in a loop which you would otherwise get out of. Or if you have dynamic gates in your design and run the clock too fast, the dynamic nodes might not precharge all the way up, and that intermediate voltage on the dynamic node will cause both pull up and pull down network on the gate that is connected to this dynamic node to be on simultaneously, which would result in high current/power dissipation if the device is large, although usually we have keepers on these dynamic nodes which will slowly pull them up to full rail. But that could also result in wrong value.
It is possible for your cpu to be damaged by overclocking/raising voltage, but in all likelyhood if you go back to default voltage and clock rate and reset, it will function normally. YMMV. CPUs are tested to run at a certain range of temperatures, voltages, process variations, which is pretty large. Which is why you might be able to overclock your CPU with no problems. A lot of times the failing path will be one that is rarely used, so your machine will run for a few hours or even days and then crash. If it's all fun and games, that's no big deal, but if you are doing any sort of work on your machine, not so good.
 

Bartman39

Elite Member | For Sale/Trade
Jul 4, 2000
8,867
51
91
SuperTool thanks for the reply but also in a non-OC`d machine does this also apply...? (your answer that is...)

Another question on this is... since the cpu is no longer "running" basicly or going through its process`s will it quit generating as much heat as it would from normal operation...?
 

MrDudeMan

Lifer
Jan 15, 2001
15,069
94
91
Originally posted by: Bartman39
SuperTool thanks for the reply but also in a non-OC`d machine does this also apply...? (your answer that is...)

Another question on this is... since the cpu is no longer "running" basicly or going through its process`s will it quit generating as much heat as it would from normal operation...?

i dont know really, but i would say no..it probably doesnt get any cooler than it would on normal idle (maybe a little more) because it is still on and still looping
 

capybara

Senior member
Jan 18, 2001
630
0
0
Originally posted by: Bartman39
SuperTool thanks for the reply but also in a non-OC`d machine does this also apply...? (your answer that is...)

Another question on this is... since the cpu is no longer "running" basicly or going through its process`s will it quit generating as much heat as it would from normal operation...?
but wait .... according to task manager, when the system freezes, the cpu is at 100%, not 0%
....but to answer your ques about possible damage, ive seen frozen systems (cpu at 100%)
for hours with no apparent damage

 

mundane

Diamond Member
Jun 7, 2002
5,603
8
81
[Digging back into my memory .. ]

Here's what I recal from my introductory Operating Systems class, I can't apply it directly to your overclocking scenario (although, like mentioned, that could cause minor errors which would create some of the following conditions ... )

There are four conditions for deadlock, which is when processes are effectively locked against eachother, preventing any work being done. Now, depending on the scheduling policy, this may cripple or disable the system.

The first condition is hold and wait, i.e., a process "holds" a resource, and is waiting for some other condition. The resource may be a space in memory, or access to some other shared resource. The second is not pre-emption, saying that a process may not forcefully take a resource away from another process (imagine if any running process could grab ANYTHING from any other, at any time it wished, the system would cease to function). The third condition is Circular Wait, i.e. Process A is waiting for a resource Process B holds, while Process B is waiting for something that Process A holds. This may continue on for N processes, as long as they are waiting on eachother. The last is that there is mutual exclusion, i.e. [certain] resources may only be held by a single process at a time (picture a single variable, you can't have two threads modifying/reading it simultaneously and expect deterministic behavior).

Those are the standard conditions that must exist, I cannot speak for what happens when a machine is overclocked (All Bets Are Off). But following this logic, like Capybara said, the CPU will be running full bore, trying to do "something". It may be an infinite loop (SuperTool's message ), or it may just be occupying all it's time with contect switches (switching from one process to the next, trying to get some work done), or doing something else equally useless.

I hope this make some sense (and is correct =) ).

-Josh
 
Jun 26, 2002
185
0
0
Usually I would say it is a software problem more than anything. When you OC the chip one of transistors may not have time to switch, or many more things, so it sends a 1 out instead of a 0. This screws up the program, and the program doens't know what to do. Then it goes to what diegoalcatraz wrote. If you don't OC the same thing can happen if there is a bug in the software. Something happens that it doesn't expect. It's not really that the CPU freezes, more of the software freezes.

This is why with server software you can usually just end the program that is frozen and the computer restores itself. Windows is getting close to being able to do this. If a program freezes you can usually recover the computer, but the problem is windows still freezes (or a driver) and then you are SOL.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: Peacekeeper100
Usually I would say it is a software problem more than anything. When you OC the chip one of transistors may not have time to switch, or many more things, so it sends a 1 out instead of a 0. This screws up the program, and the program doens't know what to do. Then it goes to what diegoalcatraz wrote. If you don't OC the same thing can happen if there is a bug in the software. Something happens that it doesn't expect. It's not really that the CPU freezes, more of the software freezes.

This is why with server software you can usually just end the program that is frozen and the computer restores itself. Windows is getting close to being able to do this. If a program freezes you can usually recover the computer, but the problem is windows still freezes (or a driver) and then you are SOL.

That would be a hardware problem. The CPU doesn't do what it should, so, sure the end result is that the program crashes, but the fault lies in the CPU (hardware). You cannot reasonably expect an operating system to recover from a hardware fault, since on faulty hardware, the operating system cannot even trust its own information.
 
Jun 26, 2002
185
0
0
Originally posted by: CTho9305
Originally posted by: Peacekeeper100
Usually I would say it is a software problem more than anything. When you OC the chip one of transistors may not have time to switch, or many more things, so it sends a 1 out instead of a 0. This screws up the program, and the program doens't know what to do. Then it goes to what diegoalcatraz wrote. If you don't OC the same thing can happen if there is a bug in the software. Something happens that it doesn't expect. It's not really that the CPU freezes, more of the software freezes.

This is why with server software you can usually just end the program that is frozen and the computer restores itself. Windows is getting close to being able to do this. If a program freezes you can usually recover the computer, but the problem is windows still freezes (or a driver) and then you are SOL.

That would be a hardware problem. The CPU doesn't do what it should, so, sure the end result is that the program crashes, but the fault lies in the CPU (hardware). You cannot reasonably expect an operating system to recover from a hardware fault, since on faulty hardware, the operating system cannot even trust its own information.

Yes I said that wrong. I meant more of it is the software the freezes, but the CPU orginally causes the problem. Not the CPU that freezes.
 

Hazer

Member
Feb 16, 2003
104
0
0
I used to do test and verification on processors for a short time. Here's my take on this: A lock-up is when the data gets corrupted. The CPU is a logical machine (duh), so if the instructions being sent to it (or inside it) gets garbled, then all hell breaks loose. Example: Your software is written in a very specific sequential order. Say its trying to send a 32 bit word from system part A to system part B. The software will have a specific binary instruction set to accomplish this. The CPU is just the mule to accomplish this task. If any one of the bits in any of the instructions gets garbled, then the CPU runs off the intended track of the software instruction set. The operation get all mixed up, and the result is garbled. Say the instruction to read that 32 bit word is changed by one bit. The new instruction (by being off one bit) could be to take a cache word and perform a XOR function on it compared to another 32 bit word. The result given is completely different than what the next instruction was expecting. The result is wildly different. The next instruction could be to store that 32 bit word to system part B. System part B would get completely wrong data, and any further operations that were meant sequentially get worse and worse. The software no longer understands what is happening and 'locks-up'. CPU utilizations could be anything. The results become totally random.

So how does the data get corrupted? Maybe its was corrupted on the HDD. Maybe the software has bugs in it. Does it hurt your CPU? No. The only thing happening is that the CPU is now running random commands that make no sense, but in its normal operating conditions.

But here's how overclocking comes into play: CPU's are made from doped silicon with very small transistor gates. These gates operate under small tolerances of voltage and speed. When you increase the speed, the temperature rises. When you increase the voltage, temperature rises. When temperature rises, the transistors can start operating randomly and erratic. Instead of producing a binary 1 when its supposed to, it could start sending out a binary 0. Once the bits start to change from what they are supposed to, then eventually your system 'locks-up'. This is why cooling is so important to overclocking. CPU architecture is highly dependant on timing and voltage and temperature. Its designed to have tolerance. When you overclock, you push those tolerances. If you have a system lock-up while overclocking, this means that one ( or many) transistor(s) are now operating outside of its rated tolerance. Is this harmful? Usually no. If you are stepping up your speeds and voltage very moderately, and have adequate cooling, there is no damage to the transistor. If you change the voltage in extreme steps, then you can easily burn out that transistor. If you dont have adequate cooling, then you can burn out the transistor because the transistor will still be doing its job millions of times while the system is locked-up, and increasing the heat on that part of the chip (if the reason for corruption is due solely on inadequate cooling).

Ever wonder why overclocking is possible and why certain chips overclock better than others? Whenever a new CPU architecture is made, the chip is produced on a production line. After the chip is gone through all of its processes to be completed, its then tested for speed. Considering that these transistors are so small they can only be viewed by an electron microscope, the manufatcuring process has certain imperfections in it. The silicon can be impure in a small way in a small location. Other processes vary too. So each chip is tested. A small yeild will be bad non-operating chips (if designed right). A large yeild will produce chips that work at the standard speed rating for the design. And a small yeild will operate at higher-than-rated speed of the design. These will then be sold as 2 GHz chip (high yeild), 2.2GHz chip (smaller yield), and 2.4GHz chips (even smaller yield). So from the same family of CPU's, the slowest of them is the one that has the most imperfections but still able to run. But the tests used are supposed to cover every conceivable operation of the CPU. An operating system may never even use the part of the chip with the imperfection that was caught during testing at higher speeds. Normal manufacturing process is to take the highest non-failing speed, and package the chip at 90% of that speed. So in all actuality, every processor should be able to run at 110% than what it is packaged.