Sample variance vs population variance

Paperdoc

Platinum Member
Aug 17, 2006
2,499
374
126
This is a subset of a general concept in statistics. There is a population which consists of every person or thing of interest. However, in most practical situations we cannot get around to measuring every member of the population. For example, how many words are there, on average, per book in the public library downtown? Fortunately this is a finite population and it would actually be possible to count every word in every book, but who would do that? Instead we would select a small subset of all the books and then find a way to "count" "all" of the words in each of the subset's books. Even then we might short-cut the task with some reasonable approximation technique.

So what we do in most cases is ESTIMATE the characteristics of the whole population by selecting a small subset of it, called a SAMPLE, and doing our measurements on that. We can get individual data points for each memeber of the Sample, then calculate various parameters like the Mean, Variance, etc. Then we claim, based on some argument that the SAMPLE actually is truly an unbiased representation of the whole POPULATION, that the real measured value for the SAMPLE is an accurate ESTIMATE of the value for the POPULATION. Because we never actually measure all members of the population, we cannot know for certain the statistic (for example, the mean for the entire population), but we can know an estimate of that from our work with the sample. The same applies to Variance - we use the measured value from the sample as our estimate of the truth for the whole population.

Statitics ensures that such work is as presice mathematically as possible, so besides having ways to get estimates of these parameters for the population, based on finite limited samples, the tools also include ways to quantify just how precise the estimates are. So we can construct Confidence Intervals associated with each statistic. We can tell you the Confidence Interval around our estimate of the Population Mean, and another around our estimate of the Population Variance. Don't forget, there is no Confidence Interval for the Sample Mean - we actually measured the parameter of interest for every one of the sample members, and there is NO doubt about the value of the Sample Mean. (Well, that ignores what we know about the precision of the measurement tools themselves.) The possibility of small error arises only as we extend our known sample value and say it is an Estimate of the truth for the population as a whole.