- Jun 30, 2004
- 15,671
- 1,422
- 126
A Proposal for a Standard Approach in the Comparison and Measurement of Core Temperatures
We typically check reviews about heatsinks and water-cooling products, looking for real, objective bench-tests, and measures or indexes that allow us to accurately determine performance from these reviews against other products -- independent of other important considerations, like "ease of installation."
Enthusiasts often quote their peak temperatures as an exclusive, final measure of cooling performance. We measure "Idle" temperature and "Load" temperature using programs such as Prime95, S&M, ORTHOS and Super-Pi. Everest Ultimate Edition has its own stress tests, and there are several other programs or program-components that provide them. Here, must temperature comparisons are done on the basis of ORTHOS and Super-Pi.
A statistician doesn't want only a single number or observation-value as a reliable measurement. In the measurement of temperatures, it would be better to report a distribution of sampled values -- a "frequency distribution" -- over a timed period, such as one hour.
Statisticians often refer to the concept of an "outlier" in a statistical distribution. An outlier falls outside the three-sigma or four-sigma limits of statistical distributions, and often has an "assignable cause," or it arises as a spurious anomaly due to measurement error, or some miscellaneous causes that have little to do with the conditions governing the remaining statistical observations.
Let me construct an example. Suppose someone measures their peak load temperature at 51C. Suppose also that you measure your peak load temperature at 51C. But suppose the entire distribution of observed load values has an average of 46C for the first forum member (call him or her "A"), while B's average load value is 41C. And also suppose that A has a standard deviation from the mean of 3C degrees, while B's standard deviation is 6C degrees. Then B's temperatures vary more between low and high values that A's -- which is a significant description of cooling performance. Further, if B's standard deviation is less -- assume this time it is only 3C degrees like A's measure of variance -- then B most likely has significantly lower temperatures than A, and B's temperatures are equally stable. And this is again a more complete empirical description of what is going on in both individuals' tests, as opposed to simply comparing a peak of 51C with someone else's peak value of 51C.
In other words, reporting peak values doesn't give us any indication of these things.
Some may argue that some peak temperature value would determine the difference between stability and instability, but one might counter to observe that it depends on whether the reading resulted from the sensor itself, or some factor unrelated to the actual temperature of the component being measured. But it just makes more sense to report frequency distributions of temperatures as opposed to the upper boundary of the statistical range -- the peak "load" temperature.
With ORTHOS, the stress on the processor and other components such as RAM varies over any period of time. Given the organization of the program and the calculations it performs, this stress is not random and is determined by the battery of tests currently under way. But it would be more descriptive and more accurate to report more than just "peak load temperature" -- whether the cause of the highest reading is random or non-randomly determined by the program at some point in time.
Other sources of error or variation may enter into the sampling process, whether they are due to inaccuracies in the thermal sensors or something else of a momentary nature that might contribute to outliers. Even so, comparing absolute values or single values between two different computers doesn't reveal much, but measures such as standard deviation still provides a reliable measure if the temperature monitors of a computer are biassed one way or the other for whatever reason.
CORETEMP, and possibly other programs provide a logging capability. CORETEMP will create a text-file log, sampling and logging Core#1 and Core#2 temperatures at user-chosen intervals.
This text file can be imported into a software tool like Excel, and will appear in three columns. The first column is unneeded if a forum-user or a web-site reviewer reports the "test-bench" configuration in his post, because it merely reports the processor speed at the time of sampling -- giving a column of numbers which are all the same. But the remaining two columns report the sampled temperature values for Core#1 and Core#2, respectively, for a dual-core system. And in cases where every single reading over an hour's time is identical between the two cores, only one of those columns would be needed provided the observer notes that the values are identical for the two cores in an explanatory note.
Certainly, people making these comparisions should routinely report other relevant information:
(1) Computer configuration, components and their manufacturer-of-origin, including [of course]
the cooling device being tested
(2) Over-clock speed and voltage settings
(3) Memory timings
(4) Room Ambient temperature(s)
(5) [If possible] -- the actual thermal wattage or a reliable estimate of peak thermal wattage for the current over-clock setting
(6) The stock peak thermal wattage (TDP) of the component of interest in the test, such as
the CPU
This is an open-ended proposal; there may be items missing in this list; and anyone else might suggest additions.
ROOM AMBIENT TEMPERATURE
Room ambient will vary over time. It may be controlled to an acceptable degree if there is a temperature-controlled air-conditioner running which makes adjustments with some accuracy and immediacy. Serious testing, meaning a laboratory-controlled benchmark, might employ a closed, temperature-controlled container for the computer, but most of us don't have the luxury or the time to build or use one.
Therefore, it might be useful to take readings of Room Ambient more than once during a test using a program like ORTHOS over a standard period -- for example, an hour's elapse, with Room Ambient readings taken every 15 minutes. A household thermostat-thermometer is a very approximate and inaccurate device; a digital thermometer can be placed close to an intake fan and gives more accurate readings.
As a rule, changes in Room Ambient should result in degree-for-degree increases or decreases in computer component temperatures, even though there may be some consistent lag between the Room Ambient change and the component temperature change. But unless TEC or phase-change is used, room temperature will affect both air and water-cooling, although, again, there might be a greater lag between the room ambient change and the temperature prevailing inside a large water reservoir.
At least for air-cooling, this makes it feasible to adjust component readings taken at different room ambients -- within the effective range of the cooler, and most heat-pipe coolers, for example, will exhibit this one-for-one variation with room ambient.
CHOICE OF TESTS
ORTHOS offers several types of tests. "Small FFTs" stresses mostly the CPU; "Large FFTs" stresses the RAM; "Blend Test" stresses both CPU and RAM.
I submit as an issue for discussion that a test of a CPU cooler might be more relevant if it stresses only that component. Stressing an (INtel) memory controller and RAM introduces factors for a CPU-cooler test which must be mitigated by enhancements separate from a CPU-cooler. On the other hand, an AMD processor which has a memory controller built-in might be tested using a different test-choice. Even so, it might be useful to run more than one type of test to show different results. My example follows below.
STATISTICAL REPORTING
Once a CORETEMP time-series is loaded into an Excel (or other) spreadsheet, you can use some statistical operators toward creating a frequency distribution chart of load temperature performance:
MIN(cell-range) produces the minimum value in a column or row of numbers
MAX(cell-range) produces the maximum value
AVERAGE(cell-range) provides the statistical average of a column or row of numbers
STDEV(cell-range) provides the standard error or standard deviation around the mean or average
The square of the standard deviation is called the "variance," and pretty much measures the same thing: the "spread" or variation of observations around the mean. So statisticians describe a sample of observations in increments of "1-sigma," "2-sigma," "3-sigma," etc. One standard deviation on either side of the mean contains about 2/3 of the observations; a range between "Mean - 2sigma and Mean + 2sigma" contains more than 90% of the observations; a range between "Mean - 3sigma and Mean + 3sigma" contains maybe 99% of the observations. So this allows one to do statistical testing to see, for instance, whether an average taken over one sample of observations is shows a "statistically significant difference" from an average taken over another set of observations -- for instance, to compare two different cooling methods or cooling on two different computers.
FREQUENCY(data cell-range, bin cell-range)
This is an "array" function that allows you to enumerate the frequency of things -- in our case, temperature readings -- over the entire sample. For instance, suppose you run ORTHOS for one minute, sampling every 8 seconds (or 8,000 milliseconds). You might have seven observations in Celsius degrees: 34, 35, 34, 42, 41, 42, 40. So the temperature interval containing 34 has a frequency of 2; the interval containing 35 has a frequency of 1; the interval containing 42 has a frequency of 2, etc.
Over an hour's time, sampling every 8 seconds, you will get approximately 450 observed temperature readings. So this Excel function allows you to instantaneously distribute the various counts of readings at discrete temperatures.
The "bin cell-range" is the sequential set of temperatures for which you wish this frequency-counting to take place, and you would find it by taking the MIN and MAX of each column of data -- the sampled temperatures -- and then create another column with those temperatures in ascending order.
To use the FREQUENCY function, you would simply select a blank column that has the same number of cells as the bin-range column, and edit into the formula text-box:
=FREQUENCY(data-range, bin-range)
For example, I have 450 sampled temperature readings over an hour at the rate of one every eight seconds, and I've imported this data into a spreadsheet so that it is in column "B," or the range B5:B455. My range of temperatures might be 31 to 39, so I enter the values 31, 32, 33, . . . . 39 into column "C" starting with (any row), but say I start at C5. I use column E5:E14 as my "result" column, and I select or highlight that column, entering this formula into the formula text-box:
=FREQUENCY(B5:B455,C5:C14)
and I then type CTRL-SHIFT-ENTER.
I can now select this result column again, and use it to create a bar-chart of the frequency distribution.
RETURN TO THE DIAMOND-SLURRY QUESTION
In another thread, comparing air-cooling, water-cooling and phase-change cooling, I began posting some preliminary results as I test the addition of abrasive-diamond-powder to one or more mediums -- thermal pastes or greases. There were questions I raised in which I suggested that the difficulty of applying the paste might lead to a degradation in its performance, cancelling out or masking any improvements in the use of the diamond-powder.
I may re-run those tests. But the original diamond-slurry is still "in operation" on my test-chassis Conroe system:
Processor: Intel E6600; multiplier = 9; FSB external frequency = 364
Memory: Crucial Ballistix PC2-8000; timings 3,4,4,8; DDR2-728
Graphics: BFG nVidia 8800 GTS 640MB
Motherboard: ASUS Striker Extreme; [BIOS revision = ________]
CPU cooler: ThermalRight Ultra-120-Extreme, custom-lapped
VGA cooler: stock BFG
Here is a frequency-bar-chart for the ORTHOS Blend-Test, run for 1 hour with room ambient 74F:
E6600 @ 3.275, Ultra-120-Extreme, 74F Ambient, ORTHOS Blend-Test Results
And here is the frequency-bar-chart for the ORTHOS Small-FFT (stress CPU) test, run for one hour at 70F room ambient:
E6600 @ 3.275, Ultra-120-Extreme, 70F Ambient, ORTHOS Small-FFT (stress CPU) Test
Note that these temperature distributions are not "normally" distributed in the well-known "bell-curve" pattern. So they are probably not randomly distributed, and we can speculate the "cause" as being the discrete differences between the batteries of tests run under ORTHOS. This would lend itself to the argument that peak values are important, but again I would wager that the distribution of temperatures reveals more and presents more information.
Here are the "central tendency" and "dispersion" measures for the Blend Test data:
Mean or Average = 41.69C degrees
Standard Deviation = 3.21C degrees
And here are the Mean and Standard Deviation for the "Small FFT (stress CPU)" test:
Mean or Average = 43.72C
Standard Deviation = 0.552C
Here, even the peak value of 44C and the mean value are almost the same. And the standard error is a squeeky little sliver of just more than a half-degree Celsius. I suggest at this point -- tentatively -- that if one were testing a heatsink on an INtel CPU (as opposed to the AMD processor with integrated memory controller) -- the small FFT test is more consistent and indicative of factors exclusive to CPU cooling. But additional contributors may wish to dispute that and show supporting reasons.
In my struggle to have a meaningful and more productive life, I will attempt to test the "Silver Eagle" heatsink-mod tonight, and provide some more charts.
I would post my data. The fabrication of data is difficult and too costly but for hoaxes with extremely high expected payoffs, and if one were to post some 450 phony observations, one would have to deliberately choose all 450 to provide the desired mean and standard error. But I don't think it is necessary to do this here, although I could put them out for download. But publishing them as other than a web-link in this (or any) thread is like asking you to read a ton of junk-mail.
I did scroll through the two Core#1 and Core#2 columns, and discovered that they are identical numbers -- without exception -- without any exception at all. However, it would be easy to include them in the data ranges for the frequency distribution, and create a bar chart in two colors: blue for core#1, and red for core#2.
We typically check reviews about heatsinks and water-cooling products, looking for real, objective bench-tests, and measures or indexes that allow us to accurately determine performance from these reviews against other products -- independent of other important considerations, like "ease of installation."
Enthusiasts often quote their peak temperatures as an exclusive, final measure of cooling performance. We measure "Idle" temperature and "Load" temperature using programs such as Prime95, S&M, ORTHOS and Super-Pi. Everest Ultimate Edition has its own stress tests, and there are several other programs or program-components that provide them. Here, must temperature comparisons are done on the basis of ORTHOS and Super-Pi.
A statistician doesn't want only a single number or observation-value as a reliable measurement. In the measurement of temperatures, it would be better to report a distribution of sampled values -- a "frequency distribution" -- over a timed period, such as one hour.
Statisticians often refer to the concept of an "outlier" in a statistical distribution. An outlier falls outside the three-sigma or four-sigma limits of statistical distributions, and often has an "assignable cause," or it arises as a spurious anomaly due to measurement error, or some miscellaneous causes that have little to do with the conditions governing the remaining statistical observations.
Let me construct an example. Suppose someone measures their peak load temperature at 51C. Suppose also that you measure your peak load temperature at 51C. But suppose the entire distribution of observed load values has an average of 46C for the first forum member (call him or her "A"), while B's average load value is 41C. And also suppose that A has a standard deviation from the mean of 3C degrees, while B's standard deviation is 6C degrees. Then B's temperatures vary more between low and high values that A's -- which is a significant description of cooling performance. Further, if B's standard deviation is less -- assume this time it is only 3C degrees like A's measure of variance -- then B most likely has significantly lower temperatures than A, and B's temperatures are equally stable. And this is again a more complete empirical description of what is going on in both individuals' tests, as opposed to simply comparing a peak of 51C with someone else's peak value of 51C.
In other words, reporting peak values doesn't give us any indication of these things.
Some may argue that some peak temperature value would determine the difference between stability and instability, but one might counter to observe that it depends on whether the reading resulted from the sensor itself, or some factor unrelated to the actual temperature of the component being measured. But it just makes more sense to report frequency distributions of temperatures as opposed to the upper boundary of the statistical range -- the peak "load" temperature.
With ORTHOS, the stress on the processor and other components such as RAM varies over any period of time. Given the organization of the program and the calculations it performs, this stress is not random and is determined by the battery of tests currently under way. But it would be more descriptive and more accurate to report more than just "peak load temperature" -- whether the cause of the highest reading is random or non-randomly determined by the program at some point in time.
Other sources of error or variation may enter into the sampling process, whether they are due to inaccuracies in the thermal sensors or something else of a momentary nature that might contribute to outliers. Even so, comparing absolute values or single values between two different computers doesn't reveal much, but measures such as standard deviation still provides a reliable measure if the temperature monitors of a computer are biassed one way or the other for whatever reason.
CORETEMP, and possibly other programs provide a logging capability. CORETEMP will create a text-file log, sampling and logging Core#1 and Core#2 temperatures at user-chosen intervals.
This text file can be imported into a software tool like Excel, and will appear in three columns. The first column is unneeded if a forum-user or a web-site reviewer reports the "test-bench" configuration in his post, because it merely reports the processor speed at the time of sampling -- giving a column of numbers which are all the same. But the remaining two columns report the sampled temperature values for Core#1 and Core#2, respectively, for a dual-core system. And in cases where every single reading over an hour's time is identical between the two cores, only one of those columns would be needed provided the observer notes that the values are identical for the two cores in an explanatory note.
Certainly, people making these comparisions should routinely report other relevant information:
(1) Computer configuration, components and their manufacturer-of-origin, including [of course]
the cooling device being tested
(2) Over-clock speed and voltage settings
(3) Memory timings
(4) Room Ambient temperature(s)
(5) [If possible] -- the actual thermal wattage or a reliable estimate of peak thermal wattage for the current over-clock setting
(6) The stock peak thermal wattage (TDP) of the component of interest in the test, such as
the CPU
This is an open-ended proposal; there may be items missing in this list; and anyone else might suggest additions.
ROOM AMBIENT TEMPERATURE
Room ambient will vary over time. It may be controlled to an acceptable degree if there is a temperature-controlled air-conditioner running which makes adjustments with some accuracy and immediacy. Serious testing, meaning a laboratory-controlled benchmark, might employ a closed, temperature-controlled container for the computer, but most of us don't have the luxury or the time to build or use one.
Therefore, it might be useful to take readings of Room Ambient more than once during a test using a program like ORTHOS over a standard period -- for example, an hour's elapse, with Room Ambient readings taken every 15 minutes. A household thermostat-thermometer is a very approximate and inaccurate device; a digital thermometer can be placed close to an intake fan and gives more accurate readings.
As a rule, changes in Room Ambient should result in degree-for-degree increases or decreases in computer component temperatures, even though there may be some consistent lag between the Room Ambient change and the component temperature change. But unless TEC or phase-change is used, room temperature will affect both air and water-cooling, although, again, there might be a greater lag between the room ambient change and the temperature prevailing inside a large water reservoir.
At least for air-cooling, this makes it feasible to adjust component readings taken at different room ambients -- within the effective range of the cooler, and most heat-pipe coolers, for example, will exhibit this one-for-one variation with room ambient.
CHOICE OF TESTS
ORTHOS offers several types of tests. "Small FFTs" stresses mostly the CPU; "Large FFTs" stresses the RAM; "Blend Test" stresses both CPU and RAM.
I submit as an issue for discussion that a test of a CPU cooler might be more relevant if it stresses only that component. Stressing an (INtel) memory controller and RAM introduces factors for a CPU-cooler test which must be mitigated by enhancements separate from a CPU-cooler. On the other hand, an AMD processor which has a memory controller built-in might be tested using a different test-choice. Even so, it might be useful to run more than one type of test to show different results. My example follows below.
STATISTICAL REPORTING
Once a CORETEMP time-series is loaded into an Excel (or other) spreadsheet, you can use some statistical operators toward creating a frequency distribution chart of load temperature performance:
MIN(cell-range) produces the minimum value in a column or row of numbers
MAX(cell-range) produces the maximum value
AVERAGE(cell-range) provides the statistical average of a column or row of numbers
STDEV(cell-range) provides the standard error or standard deviation around the mean or average
The square of the standard deviation is called the "variance," and pretty much measures the same thing: the "spread" or variation of observations around the mean. So statisticians describe a sample of observations in increments of "1-sigma," "2-sigma," "3-sigma," etc. One standard deviation on either side of the mean contains about 2/3 of the observations; a range between "Mean - 2sigma and Mean + 2sigma" contains more than 90% of the observations; a range between "Mean - 3sigma and Mean + 3sigma" contains maybe 99% of the observations. So this allows one to do statistical testing to see, for instance, whether an average taken over one sample of observations is shows a "statistically significant difference" from an average taken over another set of observations -- for instance, to compare two different cooling methods or cooling on two different computers.
FREQUENCY(data cell-range, bin cell-range)
This is an "array" function that allows you to enumerate the frequency of things -- in our case, temperature readings -- over the entire sample. For instance, suppose you run ORTHOS for one minute, sampling every 8 seconds (or 8,000 milliseconds). You might have seven observations in Celsius degrees: 34, 35, 34, 42, 41, 42, 40. So the temperature interval containing 34 has a frequency of 2; the interval containing 35 has a frequency of 1; the interval containing 42 has a frequency of 2, etc.
Over an hour's time, sampling every 8 seconds, you will get approximately 450 observed temperature readings. So this Excel function allows you to instantaneously distribute the various counts of readings at discrete temperatures.
The "bin cell-range" is the sequential set of temperatures for which you wish this frequency-counting to take place, and you would find it by taking the MIN and MAX of each column of data -- the sampled temperatures -- and then create another column with those temperatures in ascending order.
To use the FREQUENCY function, you would simply select a blank column that has the same number of cells as the bin-range column, and edit into the formula text-box:
=FREQUENCY(data-range, bin-range)
For example, I have 450 sampled temperature readings over an hour at the rate of one every eight seconds, and I've imported this data into a spreadsheet so that it is in column "B," or the range B5:B455. My range of temperatures might be 31 to 39, so I enter the values 31, 32, 33, . . . . 39 into column "C" starting with (any row), but say I start at C5. I use column E5:E14 as my "result" column, and I select or highlight that column, entering this formula into the formula text-box:
=FREQUENCY(B5:B455,C5:C14)
and I then type CTRL-SHIFT-ENTER.
I can now select this result column again, and use it to create a bar-chart of the frequency distribution.
RETURN TO THE DIAMOND-SLURRY QUESTION
In another thread, comparing air-cooling, water-cooling and phase-change cooling, I began posting some preliminary results as I test the addition of abrasive-diamond-powder to one or more mediums -- thermal pastes or greases. There were questions I raised in which I suggested that the difficulty of applying the paste might lead to a degradation in its performance, cancelling out or masking any improvements in the use of the diamond-powder.
I may re-run those tests. But the original diamond-slurry is still "in operation" on my test-chassis Conroe system:
Processor: Intel E6600; multiplier = 9; FSB external frequency = 364
Memory: Crucial Ballistix PC2-8000; timings 3,4,4,8; DDR2-728
Graphics: BFG nVidia 8800 GTS 640MB
Motherboard: ASUS Striker Extreme; [BIOS revision = ________]
CPU cooler: ThermalRight Ultra-120-Extreme, custom-lapped
VGA cooler: stock BFG
Here is a frequency-bar-chart for the ORTHOS Blend-Test, run for 1 hour with room ambient 74F:
E6600 @ 3.275, Ultra-120-Extreme, 74F Ambient, ORTHOS Blend-Test Results
And here is the frequency-bar-chart for the ORTHOS Small-FFT (stress CPU) test, run for one hour at 70F room ambient:
E6600 @ 3.275, Ultra-120-Extreme, 70F Ambient, ORTHOS Small-FFT (stress CPU) Test
Note that these temperature distributions are not "normally" distributed in the well-known "bell-curve" pattern. So they are probably not randomly distributed, and we can speculate the "cause" as being the discrete differences between the batteries of tests run under ORTHOS. This would lend itself to the argument that peak values are important, but again I would wager that the distribution of temperatures reveals more and presents more information.
Here are the "central tendency" and "dispersion" measures for the Blend Test data:
Mean or Average = 41.69C degrees
Standard Deviation = 3.21C degrees
And here are the Mean and Standard Deviation for the "Small FFT (stress CPU)" test:
Mean or Average = 43.72C
Standard Deviation = 0.552C
Here, even the peak value of 44C and the mean value are almost the same. And the standard error is a squeeky little sliver of just more than a half-degree Celsius. I suggest at this point -- tentatively -- that if one were testing a heatsink on an INtel CPU (as opposed to the AMD processor with integrated memory controller) -- the small FFT test is more consistent and indicative of factors exclusive to CPU cooling. But additional contributors may wish to dispute that and show supporting reasons.
In my struggle to have a meaningful and more productive life, I will attempt to test the "Silver Eagle" heatsink-mod tonight, and provide some more charts.
I would post my data. The fabrication of data is difficult and too costly but for hoaxes with extremely high expected payoffs, and if one were to post some 450 phony observations, one would have to deliberately choose all 450 to provide the desired mean and standard error. But I don't think it is necessary to do this here, although I could put them out for download. But publishing them as other than a web-link in this (or any) thread is like asking you to read a ton of junk-mail.
I did scroll through the two Core#1 and Core#2 columns, and discovered that they are identical numbers -- without exception -- without any exception at all. However, it would be easy to include them in the data ranges for the frequency distribution, and create a bar chart in two colors: blue for core#1, and red for core#2.