A Proposal for a Standard Approach in the Comparison and Measurement of Core Temperatures

BonzaiDuck

Lifer
A Proposal for a Standard Approach in the Comparison and Measurement of Core Temperatures

We typically check reviews about heatsinks and water-cooling products, looking for real, objective bench-tests, and measures or indexes that allow us to accurately determine performance from these reviews against other products -- independent of other important considerations, like "ease of installation."

Enthusiasts often quote their peak temperatures as an exclusive, final measure of cooling performance. We measure "Idle" temperature and "Load" temperature using programs such as Prime95, S&M, ORTHOS and Super-Pi. Everest Ultimate Edition has its own stress tests, and there are several other programs or program-components that provide them. Here, must temperature comparisons are done on the basis of ORTHOS and Super-Pi.

A statistician doesn't want only a single number or observation-value as a reliable measurement. In the measurement of temperatures, it would be better to report a distribution of sampled values -- a "frequency distribution" -- over a timed period, such as one hour.

Statisticians often refer to the concept of an "outlier" in a statistical distribution. An outlier falls outside the three-sigma or four-sigma limits of statistical distributions, and often has an "assignable cause," or it arises as a spurious anomaly due to measurement error, or some miscellaneous causes that have little to do with the conditions governing the remaining statistical observations.

Let me construct an example. Suppose someone measures their peak load temperature at 51C. Suppose also that you measure your peak load temperature at 51C. But suppose the entire distribution of observed load values has an average of 46C for the first forum member (call him or her "A"), while B's average load value is 41C. And also suppose that A has a standard deviation from the mean of 3C degrees, while B's standard deviation is 6C degrees. Then B's temperatures vary more between low and high values that A's -- which is a significant description of cooling performance. Further, if B's standard deviation is less -- assume this time it is only 3C degrees like A's measure of variance -- then B most likely has significantly lower temperatures than A, and B's temperatures are equally stable. And this is again a more complete empirical description of what is going on in both individuals' tests, as opposed to simply comparing a peak of 51C with someone else's peak value of 51C.

In other words, reporting peak values doesn't give us any indication of these things.

Some may argue that some peak temperature value would determine the difference between stability and instability, but one might counter to observe that it depends on whether the reading resulted from the sensor itself, or some factor unrelated to the actual temperature of the component being measured. But it just makes more sense to report frequency distributions of temperatures as opposed to the upper boundary of the statistical range -- the peak "load" temperature.

With ORTHOS, the stress on the processor and other components such as RAM varies over any period of time. Given the organization of the program and the calculations it performs, this stress is not random and is determined by the battery of tests currently under way. But it would be more descriptive and more accurate to report more than just "peak load temperature" -- whether the cause of the highest reading is random or non-randomly determined by the program at some point in time.

Other sources of error or variation may enter into the sampling process, whether they are due to inaccuracies in the thermal sensors or something else of a momentary nature that might contribute to outliers. Even so, comparing absolute values or single values between two different computers doesn't reveal much, but measures such as standard deviation still provides a reliable measure if the temperature monitors of a computer are biassed one way or the other for whatever reason.

CORETEMP, and possibly other programs provide a logging capability. CORETEMP will create a text-file log, sampling and logging Core#1 and Core#2 temperatures at user-chosen intervals.

This text file can be imported into a software tool like Excel, and will appear in three columns. The first column is unneeded if a forum-user or a web-site reviewer reports the "test-bench" configuration in his post, because it merely reports the processor speed at the time of sampling -- giving a column of numbers which are all the same. But the remaining two columns report the sampled temperature values for Core#1 and Core#2, respectively, for a dual-core system. And in cases where every single reading over an hour's time is identical between the two cores, only one of those columns would be needed provided the observer notes that the values are identical for the two cores in an explanatory note.

Certainly, people making these comparisions should routinely report other relevant information:

(1) Computer configuration, components and their manufacturer-of-origin, including [of course]
the cooling device being tested
(2) Over-clock speed and voltage settings
(3) Memory timings
(4) Room Ambient temperature(s)
(5) [If possible] -- the actual thermal wattage or a reliable estimate of peak thermal wattage for the current over-clock setting
(6) The stock peak thermal wattage (TDP) of the component of interest in the test, such as
the CPU

This is an open-ended proposal; there may be items missing in this list; and anyone else might suggest additions.

ROOM AMBIENT TEMPERATURE

Room ambient will vary over time. It may be controlled to an acceptable degree if there is a temperature-controlled air-conditioner running which makes adjustments with some accuracy and immediacy. Serious testing, meaning a laboratory-controlled benchmark, might employ a closed, temperature-controlled container for the computer, but most of us don't have the luxury or the time to build or use one.

Therefore, it might be useful to take readings of Room Ambient more than once during a test using a program like ORTHOS over a standard period -- for example, an hour's elapse, with Room Ambient readings taken every 15 minutes. A household thermostat-thermometer is a very approximate and inaccurate device; a digital thermometer can be placed close to an intake fan and gives more accurate readings.

As a rule, changes in Room Ambient should result in degree-for-degree increases or decreases in computer component temperatures, even though there may be some consistent lag between the Room Ambient change and the component temperature change. But unless TEC or phase-change is used, room temperature will affect both air and water-cooling, although, again, there might be a greater lag between the room ambient change and the temperature prevailing inside a large water reservoir.

At least for air-cooling, this makes it feasible to adjust component readings taken at different room ambients -- within the effective range of the cooler, and most heat-pipe coolers, for example, will exhibit this one-for-one variation with room ambient.

CHOICE OF TESTS

ORTHOS offers several types of tests. "Small FFTs" stresses mostly the CPU; "Large FFTs" stresses the RAM; "Blend Test" stresses both CPU and RAM.

I submit as an issue for discussion that a test of a CPU cooler might be more relevant if it stresses only that component. Stressing an (INtel) memory controller and RAM introduces factors for a CPU-cooler test which must be mitigated by enhancements separate from a CPU-cooler. On the other hand, an AMD processor which has a memory controller built-in might be tested using a different test-choice. Even so, it might be useful to run more than one type of test to show different results. My example follows below.

STATISTICAL REPORTING

Once a CORETEMP time-series is loaded into an Excel (or other) spreadsheet, you can use some statistical operators toward creating a frequency distribution chart of load temperature performance:

MIN(cell-range) produces the minimum value in a column or row of numbers
MAX(cell-range) produces the maximum value
AVERAGE(cell-range) provides the statistical average of a column or row of numbers
STDEV(cell-range) provides the standard error or standard deviation around the mean or average

The square of the standard deviation is called the "variance," and pretty much measures the same thing: the "spread" or variation of observations around the mean. So statisticians describe a sample of observations in increments of "1-sigma," "2-sigma," "3-sigma," etc. One standard deviation on either side of the mean contains about 2/3 of the observations; a range between "Mean - 2sigma and Mean + 2sigma" contains more than 90% of the observations; a range between "Mean - 3sigma and Mean + 3sigma" contains maybe 99% of the observations. So this allows one to do statistical testing to see, for instance, whether an average taken over one sample of observations is shows a "statistically significant difference" from an average taken over another set of observations -- for instance, to compare two different cooling methods or cooling on two different computers.

FREQUENCY(data cell-range, bin cell-range)

This is an "array" function that allows you to enumerate the frequency of things -- in our case, temperature readings -- over the entire sample. For instance, suppose you run ORTHOS for one minute, sampling every 8 seconds (or 8,000 milliseconds). You might have seven observations in Celsius degrees: 34, 35, 34, 42, 41, 42, 40. So the temperature interval containing 34 has a frequency of 2; the interval containing 35 has a frequency of 1; the interval containing 42 has a frequency of 2, etc.

Over an hour's time, sampling every 8 seconds, you will get approximately 450 observed temperature readings. So this Excel function allows you to instantaneously distribute the various counts of readings at discrete temperatures.

The "bin cell-range" is the sequential set of temperatures for which you wish this frequency-counting to take place, and you would find it by taking the MIN and MAX of each column of data -- the sampled temperatures -- and then create another column with those temperatures in ascending order.

To use the FREQUENCY function, you would simply select a blank column that has the same number of cells as the bin-range column, and edit into the formula text-box:

=FREQUENCY(data-range, bin-range)

For example, I have 450 sampled temperature readings over an hour at the rate of one every eight seconds, and I've imported this data into a spreadsheet so that it is in column "B," or the range B5:B455. My range of temperatures might be 31 to 39, so I enter the values 31, 32, 33, . . . . 39 into column "C" starting with (any row), but say I start at C5. I use column E5:E14 as my "result" column, and I select or highlight that column, entering this formula into the formula text-box:

=FREQUENCY(B5:B455,C5:C14)

and I then type CTRL-SHIFT-ENTER.

I can now select this result column again, and use it to create a bar-chart of the frequency distribution.

RETURN TO THE DIAMOND-SLURRY QUESTION

In another thread, comparing air-cooling, water-cooling and phase-change cooling, I began posting some preliminary results as I test the addition of abrasive-diamond-powder to one or more mediums -- thermal pastes or greases. There were questions I raised in which I suggested that the difficulty of applying the paste might lead to a degradation in its performance, cancelling out or masking any improvements in the use of the diamond-powder.

I may re-run those tests. But the original diamond-slurry is still "in operation" on my test-chassis Conroe system:

Processor: Intel E6600; multiplier = 9; FSB external frequency = 364
Memory: Crucial Ballistix PC2-8000; timings 3,4,4,8; DDR2-728
Graphics: BFG nVidia 8800 GTS 640MB
Motherboard: ASUS Striker Extreme; [BIOS revision = ________]
CPU cooler: ThermalRight Ultra-120-Extreme, custom-lapped
VGA cooler: stock BFG

Here is a frequency-bar-chart for the ORTHOS Blend-Test, run for 1 hour with room ambient 74F:

E6600 @ 3.275, Ultra-120-Extreme, 74F Ambient, ORTHOS Blend-Test Results

And here is the frequency-bar-chart for the ORTHOS Small-FFT (stress CPU) test, run for one hour at 70F room ambient:

E6600 @ 3.275, Ultra-120-Extreme, 70F Ambient, ORTHOS Small-FFT (stress CPU) Test

Note that these temperature distributions are not "normally" distributed in the well-known "bell-curve" pattern. So they are probably not randomly distributed, and we can speculate the "cause" as being the discrete differences between the batteries of tests run under ORTHOS. This would lend itself to the argument that peak values are important, but again I would wager that the distribution of temperatures reveals more and presents more information.

Here are the "central tendency" and "dispersion" measures for the Blend Test data:

Mean or Average = 41.69C degrees
Standard Deviation = 3.21C degrees

And here are the Mean and Standard Deviation for the "Small FFT (stress CPU)" test:

Mean or Average = 43.72C
Standard Deviation = 0.552C

Here, even the peak value of 44C and the mean value are almost the same. And the standard error is a squeeky little sliver of just more than a half-degree Celsius. I suggest at this point -- tentatively -- that if one were testing a heatsink on an INtel CPU (as opposed to the AMD processor with integrated memory controller) -- the small FFT test is more consistent and indicative of factors exclusive to CPU cooling. But additional contributors may wish to dispute that and show supporting reasons.

In my struggle to have a meaningful and more productive life, I will attempt to test the "Silver Eagle" heatsink-mod tonight, and provide some more charts.

I would post my data. The fabrication of data is difficult and too costly but for hoaxes with extremely high expected payoffs, and if one were to post some 450 phony observations, one would have to deliberately choose all 450 to provide the desired mean and standard error. But I don't think it is necessary to do this here, although I could put them out for download. But publishing them as other than a web-link in this (or any) thread is like asking you to read a ton of junk-mail.

I did scroll through the two Core#1 and Core#2 columns, and discovered that they are identical numbers -- without exception -- without any exception at all. However, it would be easy to include them in the data ranges for the frequency distribution, and create a bar chart in two colors: blue for core#1, and red for core#2.

BonzaiDuck

Lifer
FOR THOSE INTERESTED IN THE SILVER_EAGLE HEATSINK MOD

Don't do it.

I'm not sure what others were thinking when I proposed to try this, and some people thought they'd be interested in the results.

It might only have worked if the diamond-slurry had shown better than marginal improvement over JetArt Ck4800.

Without consulting texts and going to some trouble to actually calculcate what it would take to make it work, it is obvious that it adds a second thermal interface for a heatsink built on the assumption that there would be only a single thermal interface.

So the coin DOES draw heat from the processor, but there is a new equilibrium temperature (my BIOS reading showed 48C and I just exited the BIOS and shut down the system.) This - it would seem -- is because the ThermalRight cooler cannot draw heat off the coin fast enough under those circumstances.

And the only way it would work would be to concoct a safe process of welding the silver to the copper base, or if TR were to opt for a silver base instead of a copper one.

With a BIOS CPU temperature of 48C, I wouldn't have bothered running another test with that configuration, and my better judgment and already-depleted pocket-book says "NO."

So -- we'll close this page on hobbyist experiments, watch the market and further developments in air-cooled devices, and maybe start looking for some water-cooling parts when we opt for a quad-core processor.

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
LOL

Crazy duck,

i'll say this one more time because you grown on me.

The op can not change his tests on the heat sinks. The reason is becaues of accuracy and his scale.

If he changes one of his methodology, it would throw the scale off balance and ruin it as a perspective.

The results in his reviews are used and intended as a scale, not as a representation of what you will get.

BonzaiDuck

Lifer
The "op?" Are you referring to a reviewer, such as the person whose written the recent reviews on the Ultra-120's?

Not sure what you mean here.

Of course, if some reviewer were comparing different heatsinks on the same bench configuration, you COULD use peak temperatures -- which they all probably do. But they could just as easily use mean values obtained with the same test software.

But to get an idea -- as you say -- of "what you will get," I'd say it's better to use the CORETEMP (or other measurement software's) log of sampled temperatures.

Incidentally, it was just a couple years ago when people were telling me that Serj's "S&M" testing program was "the bomb." From my own experience, ORTHOS doesn't limit run-times, but the S&M Program had some promise. I e-mailed the guy -- he understands enough English -- and he said the last reviews -- I think it was 1.9.0 -- was "the last revision." He wasn't going to work with the program anymore.

BonzaiDuck

Lifer
ONCE AGAIN: I should post my configuration in a "sig," but it will be changing as I add new components. The settings are the same for previous posts on this issue of "TIM and diamond-powder:" E6600, VCORE=1.4375V; Memory Voltage = 2.100V; SB=1.55V;NB=1.35V; 1.2V_HT=1.3V; Ext. FSB frequency=364Mhz; FSB[QDR]=1,456Mhz; Memory [DDR2]=728Mhz -- No change in clock settings.

I've run another "Blend Test" for an hour this morning. The household thermostat-thermometer reads 70F -- what I report initially in the Chart linked below. However, I checked again, because this particular room is warmer given that two computers are running, and compared a thermometer to the thermostat reading. The actual temperature at the case-intake-fan is 71F. I repeat: the intake air-temperature is 71F -- not the 70F ambient reported in the Chart.

SOMETHING HAPPENED BETWEEN SILVER CITY AND KING SOLOMON'S MINES . . .

METHOD: a half-rice-grain's-worth of JetArt CK4800 goes on the processor cap, spread even and thin. A half-rice-grain's-worth of CK4800 goes on the square-area of the heatsink-base where it will mate up with the processor cap. A very light sprinkling of extra diamond-particles are then applied to the heatsink base -- leaving spots no more than 1mm in diameter, and scattered with approximately 1 to 2mm space between the spots. You can use a paddle to spread the paste and powder together, but try and deposit the residue from the paddle back on the center of the area covered. With this approach, the mix does not turn into a dried out lump that sticks more to the paddle than to the heatsink base.

1-hour ORTHOS blend test @ 71F room ambient

I'm going to run some more tests at different ambients, beginning with another "Small FFT" test, and will post the results IF anyone is interested. The peak values reported earlier for the CK4800 control test are still valid in comparing the peak values here, while adjusting for room ambient.

Under the approach I'm using now such that "Less is More" with sparing application of the stuff, I have enough to last several years. At a minimum \$100 purchase, this may not be worth it to many. But personally, I think I won't be using Arctic Silver 5 much anymore, except in maintaining slower machines among the extended family-members in which older Northwood processors have been deployed.

Whether these incremental improvements are of any value for quad-core systems, is questionable. It would depend on the intended over-clocking objective and what load temperatures one chooses to live with. Air-cooling may not be feasible at all for quad-cores unless you intend to run them at stock settings. I cannot tell if Intel's quad-core models are analogous to the introduction of the Smithfield and Presler dual-cores more than a year or so earlier -- in comparison to the TDPs of the Conroe chips. Maybe Intel will again leap forward to reduce the TDPs of these quad processors in their next incarnations of them.

BonzaiDuck

Lifer
Just to recap --

I'm testing the use of Penn-Scientific Company's "#1 grit" diamond-powder to enhance cooling of "nano-diamond JetArt CK4800" thermal grease, on an E6600 OC'd approximately 36% above stock settings, with a ThermalRight Ultra-120 Extreme cooler which was custom-lapped by SVC personnel (Silicon Valley Compucycle: SVC )

It has warmed up since the last posted "Blend Test" results. The following "Small FFT Stress CPU" ORTHOS test was run with a room ambient of 72.5F measured at the test-system's intake fan:

Small FFT ORTHOS @ 72.5F Room Ambient

So we've shown two things: (1) The Ultra-120-Extreme provides around a 5C reduction and improvement in (regular) Ultra-120 performance, and (2), additional micronized diamond powder applied sparingly will give a 2C to 4C improvement (and temperature reduction) in JetArt CK4800 thermal paste with this cooler and this level of over-clocked load thermal power.

Also, it appears that \$100-worth of the Penn-Scientific product will last you at least a half decade if you build one or two computers annually.

Well, back to the case-mod work so I can install this baby, add some ducting, and see what sort of improvement the ducting provides in temperatures. I anticipate a 2C to 5C improvement. Under the high end of that expected range, this E6600@3.275 Ghz will show a full 100% load temperature less than 40C at a room-ambient of just less than 75F.

But -- I still have to prove it. Stay tuned.

tylerdustin2008

Diamond Member
Wow thats allot to read, ill pass....

BonzaiDuck

Lifer
Issues discussed on this thread got a stard on a "watercooling versus air" thread -- also fairly recent. People were interested in whether or not you could improve on nano-diamond-based thermal compounds, and asked me for tests. Here, I wanted to lay some groundwork for better temperature reporting using statistics and Excel chart functions.

Noubourne

Senior member
College statistics-course pontificating aside, your posts offer no suggestions for temperature measurement that stray very far from common sense.

BonzaiDuck

Lifer
Well, I didn't mean to pontificate, but I'm sure there's some portion of the general population -- computer-geek or otherwise -- who are "statistically naive."

For instance, the recent furor about "evolution" and "random events." [This might be better in the "general discussion" or "politics" forums, but I need to make my point here.] The Creationists are incensed about a scientific basis for evolution that emphasizes "random" events, because to them, "random" suggests that there isn't a Prime Mover behind the outcomes.

Yet, randomness in statistical probability distributions ARE a form of natural order, and some would argue that a "Natural Order of things" demonstrates that there is a "Prime Mover." Probability distributions explain a lot. Here, I don't want to debate the issues of the day, but merely show that there are misunderstandings about "randomness" -- which is a central concept in statistics and probability.

On my writing style, your second remark best exemplifies my own convoluted way of stating things. You say I "offer no suggestions for temperature measurement that stray very far from common sense," when you could have said that my suggestions "are common sense" or "display common sense." Somehow, it's as though your acknowledgement there -- that it makes sense -- is painful to make.

But I'm only interpreting your expression there. If the tone of my earlier posts seems condescending, I can apologize for that, but I wanted to explain in as much detail as possible, and when I'm writing, I can take more words to say something than I need to.

Personal observations aside: statistical reasoning and probabilistic logic ARE common sense. And there is a reason why these simple exercises -- calculating a mean-value or standard error, and showing frequency distributions -- are called "descriptive statistics."

Let my go just a little further. CORETEMP allows you to adjust the sampling-rate, and it will save a text-file log of the sampled temperatures. It easily loads into EXCEL as a three-column table.

So why not use those features? We all agree that we should take at least an hour to test temperatures with ORTHOS, just like we agree that ORTHOS stability tests should take up to 24 hours. So it's not so much trouble to take five minutes or so (longer the first couple times) to import the log-file, run the statistical functions, and at least show two columns of numbers (Temperature, Number-of-Observations) -- even if we don't want to trouble ourselves creating the bar-charts.

These tools were one of the means by which Japan beat Detroit. I worked for almost two years with the man who was really behind that Japanese industrial effort. But that's not about me, and not so much about Deming, but about the usefulness of the tools.

If people choose to report only peak values, I'm not going to pounce on them. I'm just suggesting that the statistical approach is a better idea, while understanding that people may not want to take the time and trouble.