Excel/Math wiz, need your help!

  • Thread starter Deleted member 4644
  • Start date
Status
Not open for further replies.
D

Deleted member 4644

http://ca.calsb.org/calbar/pdf...centile-Table-0802.pdf

For the above table, the max score is 2000, although the 95th percentile was 1617.

I am trying to make a chart that shows what the distribution of scores looks like.

The passing score is cut off at 1440. I am trying to determine visually where that cut off point is with respect to the possible max score of 2000, the 95th percentile of 1617 and the data overall.

(I think I am trying to visualize the shape of the curve, that is, how far skewed is the data away from a bell from 0 to 2000... because obviously? the scale is not a normal bell-shape curve from 0 to 2000? ??)

I am sorry I can't describe it better.
 

Billb2

Diamond Member
Mar 25, 2005
3,035
70
86
I don't think there is enough infirmation in the table to do that.
 

eLiu

Diamond Member
Jun 4, 2001
6,407
1
0
You can sketch the shape of the curve... just assume that 100 people took the test. 95th precentile is at 1617, so that means 5 people scored 1617 or above. 90th at 1566, so 5 people scored between 1556 and 1617. Load it into excel or matlab and run a histogram. You can get a smoother curve by doing linear interpolation in each bucket.
 

yuchai

Senior member
Aug 24, 2004
980
2
76
You can do a plot with the scores on the X axis and the percentile ont he Y axis. Fit a line through the curve to approximate your cumulative distribution curve.
Then, in theory, if you differentiate that, you would have your bell curve. But, I'm not sure to actually execute it in Excel.

Anyway, from the numbers and a passing score of 1440, you can tell that one needs a better score than 60-65% of the people who take the test in order to pass. If I were to take this test and assuming these numbers are quite typical for this test, I'd probably shoot to consistent get above 1500 to have a pretty good chance of passing.
 

Paperdoc

Platinum Member
Aug 17, 2006
2,438
344
126
Technically you can't use the Normal Distribution Function as a model for these data for two reasons. One is that the Normal function only is used for data that extend to infinity in both X directions, and these data have X limited to the range 0 through 2000. The other is that you have been given NO information that says these data are normally distributed. Well, almost none - you are told these are scores on an exam used on a large number of subjects, so you could very reasonably presume they can be approximated with a Normal Distribution. So although "technically" you can't , go ahead and use that statistical model as a first step.

Now, a Normal Distribution has several useful characteristics. Its Mean, Median and Mode are all at the same X co-ordinate. So the Mean is where the 50th percentile falls, at a score of 1389. If you look up any standard reference table of the Normal Distribution, for any value of X (normalized, that is) it will tell you the cumulative probability from x = - infinity to X, and this is just the entry in the Percentile column. For example, the Cumulative Probability of 90 % (which is 0.90 in decimal form) occurs at a score of 1566. Find a Normal Distribution table and look in it for a Cumulative value of 0.9000. (NOTE some tables have values for the right-hand half of the symmetrical curve - that is, from z=0 to your normalized value, so you look in that table for 0.4000 because the left-hand half (from - infinity to z=0) totals 0.5000. That will give you the z value that corresponds to Cumulative Probability 0.9000. If you can't lay your hands on such a table, Excel has a function called NORMSDIST() that gives you the Cumulative Probability (from z = -infinity to your z).

So, what is z? That is the Normalized value along the X axis, from the formula z = (X-u)/SD, where X is your real X value (the test score), u is the Mean which we know to be 1389 (above), and SD is the Standard Deviation for your data. Thus, for every table value of Score (X) and Cumulative Probability (Percentile) you can calculate an estimate of the Standard Deviation of your data. Then you could take the average of those estimates and use that as the "real" Standard Deviation. With the Mean and Standard Deviation thus obtained you can use Excel's NORMDIST() to generate the Probability curve or the Cumulative Probability curve that is supposed to fit your data, and visually assess whether they really do fit.

If this all convinces you the the Normal Distribution Curve was the right model to use, you could then calculate exactly what the Cumulative Probability value is corresponding to 1439, which is one less than the passing cut-off of 1440. That would be the percentage of exam writers who FAILED, since only those above it (at 1440 and higher) passed.
 
D

Deleted member 4644

If I were to ask for the data needed to shape the curve, what would it be.
 

yuchai

Senior member
Aug 24, 2004
980
2
76
I would imagine that you want all individual test scores to do the fit if possible.
 

Paperdoc

Platinum Member
Aug 17, 2006
2,438
344
126
The data you have in the table already defines the curve. It is in the form of Y (Cumulative Probability, but in terms of % instead of fraction) and X (Score).

The typical "Bell Curve" appearance of the Normal Distribution is in the form of Probability on the Y axis versus z. Z is just a way to scale the actual data into a unitless range from z = - infinity to z = + infinity. Z is defined by the formula z = {(actual "X" value) - (mean of "X")} / (Standard Deviation of the X's). The cumulative form of the curve is obtained mathematically by integrating the curve from z = - infinity up to the z value for your "X". In practice, you could approximate it by breaking up the X axis into pieces and then, moving from left to right, for each piece adding its average "Y" value to the previous total.

In your data table, you have the Cumulative Probability already in the Percentile column, so the graph of Percentile (on the Y axis) versus Score (on the X axis) looks like an S-shaped curve rising from left to right, slowly on the left, quickly rising in the middle, and leveling off on the right. It's pretty hard to get the Bell Curve out of these data because the table has Scores given for evenly-spaced value of Cumulative Probability, whereas the usual Bell Curve has Probability given for evenly-spaced intervals of Score. However, you can still do calculations on these data using the formulas for the Normal Distribution. (So far we are ASSUMING that these are the right formulas because the Normal Distribution is the right model for the data - and we plan to look closely at the results to decide whether or not that is reasonable.)

Now, if we had the values of X, Mean of X and Standard Deviation of the X's, we could calculate the z vales for each X given, look up in a table the Cumulative Probability for that z, and predict those values. In the current case we have to work backwards. We have the Cumulative Probabilities (as Percentile values) so we can use the table to see what value of z corresponds to each Percentile. For each z we now have three of the four quantities in the formula for z (z, X, Mean of X) so we can calculate the Standard Deviation of X. We can do that 19 times and they all should be pretty close to each other IF we have used the correct model for the data. The average of those 19 estimates can then be used as the "real" Standard Deviation of the X's. Finally, that puts us in a position to calculate the predicted values for Cumulative Probability (Percentile) for each Score value. Comparing these to the values of Percentile in the data table provided gives us a feel for whether the data really do fit the Normal Distribution Curve. If our answer is "close enough", then we can proceed to use all the other tools of that distribution to predict (calculate) other values, like what percentage of people actually met or exceeded the minimum score for passing, 1440. In Excel, one function, NORMDIST(), can produce both Probability and Cumulative Probability values for you to examine. As inputs it needs the values For X, Mean of X, Standard Deviation of X, and a logical switch. You just calculated all those values; for the switch function, check the Excel Help file for this statistical function.

On the other hand, if the judgment is "No, the fit is poor and the real data don't fit a classic symmetrical "Bell Curve", then we need to decide what the proper statistical model is and try its tools.

As a complete aside, IF the Normal Distribution is the correct statistical model, these data say that about 62% of people writing the California Bar Exam in 2008 failed. I bet that's by design, so that the plain "average" law student won't be good enough (yet) to get a license to practice.
 
Status
Not open for further replies.