I'm overclocking a Q6600 to 3.2 Ghz on air, for scientific computation. This is an arbitrary stopping point, with max core temps 60 to 62 C depending on ambient.
The title of this thread is taken from the "Erlang" parallel programming language, developed by Ericsson, a phone company. Like a pair of LL Bean boots where one replaces the uppers m times and the lowers n times, still calling them the same boots, one can continually hot-swap code and hardware into an Erlang program, and call it the same program. Their primary concurrency goal is not to harness extra speed from multiple processors, but to have a self-healing system e.g. if an avalanche takes out a village where half their machines are running. They tout a 99.9999999% reliability rate ("nine nines") which is unheard of in telecom. None of our overclocked boxes come anywhere close to this reliability, not that we keep any of them long enough to find out!
My hardest stress test is not Prime95 (mprime on Linux) but rather daisy-chaining builds of the GHC Haskell compiler from source, several on each core. A friend has a $10K, 8 core, 64 GB server that has been living half the time in the shop because of this stress test, which once caused his power supply to smoke and threaten fire, in front of various amused witnesses. The shop has now taken this stress test in-house, to save delivery cycles. It appears, with a new Tyan motherboard, that all is nearly ok: 99% of the builds ("two nines") succeed, and no other test is capable of revealing any hardware issues.
In contrast, we're amused to read on overclocking forums when it is someone's "policy" to accept less than 24 hours of Prime95 as stable. We'd actually like as many "nines" as we can get for stability. This is not an argument against overclocking; one has to instead understand overclocking differently. As I'm new to overclocking, others may have much better insights into how to balance these goals. I'm contributing what I know.
I dislike "panel discussions" where the moderator talks half the time. I'm paying rapt attention to any responses here, I'm just not planning to goal-tend unless I'm asked direct questions.
I'm not being judgemental here; were I a gamer, I would certainly have moved to phase change cooling by now, and I'd accept a system freeze every day or two if it bought me more thrills in between. Without gamers there simply wouldn't be a market for the motherboards I'm buying, or the overclocking expertise I'm relying on. For this I'm grateful.
What got me over the hump in learning to overclock were the articles here by Kris Boughton, which I thought were brilliant, hard to read, but ultimately a complete education in overclocking issues. Alas, I've come full circle, and for my purposes I'm in disagreement with some conclusions.
Load Line Calibration: I'm in complete agreement, monkeying with this is a completely unwarranted risk, costing various "nines" in stability.
tRD: I see at most a 0.5% effect on execution times for practical scientific computations, adjusting tRD say between 6 and 7. At 7 I get more "nines" of reliability. This choice seems a no-brainer.
CPU voltage: A few weeks ago, setting 1.28125 V in my BIOS was enough to keep my Q6600 stable at 3.2 Ghz. This was however the minimum value I could use. After a few weeks away, with the box powered off, I could not boot into Linux at this voltage. The minimum voltage crept up over the next few days, now settling at 1.30625. I've heard of aging but didn't expect to see it so soon, as part of "breaking in". In any case, more "nines" of stability requires a margin of error here. Unless there's a "glasses will only make your eyes worse" argument here, that all voltage increases feed into an unstable equilibrium? I don't know such an argument.
In short, I'm arbitrarily settling on 3.20 Ghz for 24/7 use often at full load, but many reasonable people would accept the temps and voltages required for 3.30 Ghz. In realistic benchmarks, relaxing my memory timings can be covered by a mere increase to 3.24 Ghz, an arbitrary compensation I don't need to actually make.
The ideal balance for overclocking and stability would appear to be to relax memory and be a bit generous on voltages, picking an arbitrary overclocking target that itself offers a decent margin of error.
I can see the intoxicating appeal of playing with memory timings for speed: Basic overclocking goes from hard to utterly trivial, once one learns what to do. Like windsurfers picking a trickier board, or base jumpers wearing flying suits to soar parallel to the cliff, there has to be something harder one learns next, right? I'd say, use this knowledge in reverse to gain greater stability, and move on. For me, that's parallel software that can actually use my cores.
The title of this thread is taken from the "Erlang" parallel programming language, developed by Ericsson, a phone company. Like a pair of LL Bean boots where one replaces the uppers m times and the lowers n times, still calling them the same boots, one can continually hot-swap code and hardware into an Erlang program, and call it the same program. Their primary concurrency goal is not to harness extra speed from multiple processors, but to have a self-healing system e.g. if an avalanche takes out a village where half their machines are running. They tout a 99.9999999% reliability rate ("nine nines") which is unheard of in telecom. None of our overclocked boxes come anywhere close to this reliability, not that we keep any of them long enough to find out!
My hardest stress test is not Prime95 (mprime on Linux) but rather daisy-chaining builds of the GHC Haskell compiler from source, several on each core. A friend has a $10K, 8 core, 64 GB server that has been living half the time in the shop because of this stress test, which once caused his power supply to smoke and threaten fire, in front of various amused witnesses. The shop has now taken this stress test in-house, to save delivery cycles. It appears, with a new Tyan motherboard, that all is nearly ok: 99% of the builds ("two nines") succeed, and no other test is capable of revealing any hardware issues.
In contrast, we're amused to read on overclocking forums when it is someone's "policy" to accept less than 24 hours of Prime95 as stable. We'd actually like as many "nines" as we can get for stability. This is not an argument against overclocking; one has to instead understand overclocking differently. As I'm new to overclocking, others may have much better insights into how to balance these goals. I'm contributing what I know.
I dislike "panel discussions" where the moderator talks half the time. I'm paying rapt attention to any responses here, I'm just not planning to goal-tend unless I'm asked direct questions.
I'm not being judgemental here; were I a gamer, I would certainly have moved to phase change cooling by now, and I'd accept a system freeze every day or two if it bought me more thrills in between. Without gamers there simply wouldn't be a market for the motherboards I'm buying, or the overclocking expertise I'm relying on. For this I'm grateful.
What got me over the hump in learning to overclock were the articles here by Kris Boughton, which I thought were brilliant, hard to read, but ultimately a complete education in overclocking issues. Alas, I've come full circle, and for my purposes I'm in disagreement with some conclusions.
Load Line Calibration: I'm in complete agreement, monkeying with this is a completely unwarranted risk, costing various "nines" in stability.
tRD: I see at most a 0.5% effect on execution times for practical scientific computations, adjusting tRD say between 6 and 7. At 7 I get more "nines" of reliability. This choice seems a no-brainer.
CPU voltage: A few weeks ago, setting 1.28125 V in my BIOS was enough to keep my Q6600 stable at 3.2 Ghz. This was however the minimum value I could use. After a few weeks away, with the box powered off, I could not boot into Linux at this voltage. The minimum voltage crept up over the next few days, now settling at 1.30625. I've heard of aging but didn't expect to see it so soon, as part of "breaking in". In any case, more "nines" of stability requires a margin of error here. Unless there's a "glasses will only make your eyes worse" argument here, that all voltage increases feed into an unstable equilibrium? I don't know such an argument.
In short, I'm arbitrarily settling on 3.20 Ghz for 24/7 use often at full load, but many reasonable people would accept the temps and voltages required for 3.30 Ghz. In realistic benchmarks, relaxing my memory timings can be covered by a mere increase to 3.24 Ghz, an arbitrary compensation I don't need to actually make.
The ideal balance for overclocking and stability would appear to be to relax memory and be a bit generous on voltages, picking an arbitrary overclocking target that itself offers a decent margin of error.
I can see the intoxicating appeal of playing with memory timings for speed: Basic overclocking goes from hard to utterly trivial, once one learns what to do. Like windsurfers picking a trickier board, or base jumpers wearing flying suits to soar parallel to the cliff, there has to be something harder one learns next, right? I'd say, use this knowledge in reverse to gain greater stability, and move on. For me, that's parallel software that can actually use my cores.
