Quick statistical question - number of data points relation to error

dullard · Mar 4, 2004

Suppose I have 10,000 data points and this results in a total error of +- 15 (units are not necessary for this question, but if you really want to know it is microvolts). Now suppose instead I use 100 data points. What is the error I should expect?

It has been ~ 6 years since my statistics class and I think the formula is that error is proportional to n^(-0.5). If that is true then the error with 100 data points would be 10 times greater than with 10,000 data points, giving +- 150.

Can anyone confirm this? Is my memory correct or am I way off?

Yes I can google. I'm busy and don't have time to read through a bunch of websites.

HomeBrewerDude · Mar 4, 2004

hmmm... are you using the term 'total error' or is that from what ever you are looking at?

I recall two types of statistical error.

Standard Deviation is the amount of variance within a sample.

In both cases, increasing the sample size will decrease the measure of variance.

On a more practical level, why do you care? What exactly do you want to do?

dullard · Mar 4, 2004

Ok now to my poor vocab skills - scratch the words 'total error' if that has a different meaning to you and replace it with the word 'accuracy'. I am reading a thermocouple with a PCI computer card. The PCI card has an claimed error of 0.015% +- 15 uV, based on a sample size of 10,000 samples. Due to practical reasons I cannot read the thermocoupe voltage 10,000 times. Instead I want to read it ~100 to 200 times. How confident can I be in the average of 100 readings?

Some thermocouples are about 60 uV/°C - meaning an error of 15 uV is 0.25°C. Other thermocouples have a reading of about 7.5 uV/°C - meaning an error of 15 uV is 2°C. I need to know exactly what the expected temperature error is, as 2°C is an unacceptable error.

If my formula is correct, then moving from 10,000 sample to 100 samples will increase the voltage reading inaccuracy by a factor of 10 - moving past the 2°C error even with the most sensitive thermocouples.

Bekker · Mar 4, 2004

If I recall my stat correctly, random sampling error is measured by Z S/square root of n where Z is the value associated with the confidence level you desire and S is the standard deviation of the variable estimated. If so, leaving the other terms constant in the equation, a sample of 10,000 would result in S / square root of 10,000 while a sample of 100 would result in S / Square root of 100. This might or might not have a substantial influence on your error because if S is very small, the multiplier will be very small in either case. Usually we decide the error we are willing to tolerate and then compute the desired sample size. Have no idea if this helped at all. Good luck.

Bekker

Reel · Mar 4, 2004

I think in pure statistical methods, the distribution matters. A rough estimate that tends to work is just take the square root of the number of samples. I think it is only accurate for a Poisson distribution but close for others. I don't know if I am fully accurate because my statistical skills have always been a little shaky.

Lynx516 · Mar 4, 2004

Random noise is always normaly distributed. I could dig up the info you need from my DSP book but I cannot fnd it at the mo

dullard · Mar 4, 2004

Originally posted by: Lynx516
Random noise is always normaly distributed. I could dig up the info you need from my DSP book but I cannot fnd it at the mo

So is my formula correct for a normal distribution?

Bekker · Mar 5, 2004

If you look at the equation I gave you above, it would seem that your comment that the error would be 10 times as great is correct.
RSe = Z value S/Sq rt of n, thererfore, for any given Z level with S constant, the sq rt of 10000 is 100 and the square root of 100 is 10.

TuxDave · Mar 5, 2004

When you doing your measurements, are you taking the rms of your errors? Or are you taking the upper and lower bounds of your error?

StatisticianOG · Apr 3, 2004

Ressurected thread.

By 'total error' I'll assume you mean 'margin of error'. (?)

So it must be known, or we have good reason to assume, that microvolts are distributed approximately normal?? I have no idea about microvolts.

The general expression for upper and lower bounds on an estimate of a 'true' average, of something that is normally distributed, is: sample ave +- z*SE, where SE is the standard error, and z is from a standard normal distribution and is a function of how 'confident' you want to be. In this case, SE = s/sqrt(n), where s is the standard deviation and n is the sample size.

Assuming z and s are fixed (so z*s = k, just some constant), which might not be realistic, you first have:

ME = zs/sqrt(10000) = k/sqrt(10000) = 15, so k = 1500

and then taking another sample, you get that

ME = k/sqrt(100) = 1500/10 = 150.

So it does look like the margin of error is inversely proportional to the sqrt(n) as some have mentioned.

Search

Quick statistical question - number of data points relation to error

dullard

Elite Member

HomeBrewerDude

Lifer

dullard

Elite Member

Bekker

Golden Member

Reel

Diamond Member

Lynx516

Senior member

dullard

Elite Member

Bekker

Golden Member

TuxDave

Lifer

StatisticianOG

Member

TRENDING THREADS