- Sep 10, 2001
- 12,348
- 1
- 81
Update: After reading up on R^2 and realizing that it's not really useful for this application, I've been thinking about this a little more. I think the correct approach might be to use a norm (e.g. the L2 norm) to compute the error. I can then compute the relative error in the norm to give a pretty solid comparison of the data to the model. I updated the thread title accordingly.
=========================================
I have a relatively straightforward program that solves some model equations y=f(x). I want to see how well this nonlinear model correlates with data z=g(x), eventually arriving at an R^2 value. I can find two ways to do this in MATLAB that give amazingly disparate results.
The first is using MATLAB's built-in corrcoef function, which computes the normalized correlation coefficient R:
R=corrcoef(model,data))
Rsq=R.^2
The second is calculating R^2 using the formula that I am familiar with (i.e. R^2=1-SS_err/SS_tot):
SS_err=sum((Data-Model).^2);
SS_tot=sum((Data-mean(Data)).^2);
Rsq=1-SS_err./SS_tot;
I have four sets of data and three sets of models, so Rsq ends up being a 3x4 matrix. Method #1 gives
0.9408 0.9846 0.9370 0.9408
0.9381 0.9840 0.9383 0.9381
0.9431 0.9908 0.8999 0.9431
Method #2 gives
0.5972 -0.1735 0.8220 0.4431
0.1777 0.2551 0.7937 0.8672
-2.9059 0.9313 -3.5218 0.7585
Obviously, method #1 makes my model look amazing, but I don't think it's *that* good, and I'm not familiar with computing R^2 using that method, so I'm not even sure if it's really appropriate. Is anyone more familiar with this and can tell me which way is more appropriate, or if a completely different method might be better?
edit: Looking at the results a little closer, it seems that Method #1 results are higher when the shape of the model curve is more similar to the data, but doesn't account for offsets/differences in amplitude of the changes. Method #2 appears to be very sensitive to offsets/differences in amplitude. It's almost like #1 is giving a qualitative comparison of the trends, while #2 is giving a comparison of the quantitative results, though maybe I'm reading too much into it.
=========================================
I have a relatively straightforward program that solves some model equations y=f(x). I want to see how well this nonlinear model correlates with data z=g(x), eventually arriving at an R^2 value. I can find two ways to do this in MATLAB that give amazingly disparate results.
The first is using MATLAB's built-in corrcoef function, which computes the normalized correlation coefficient R:
R=corrcoef(model,data))
Rsq=R.^2
The second is calculating R^2 using the formula that I am familiar with (i.e. R^2=1-SS_err/SS_tot):
SS_err=sum((Data-Model).^2);
SS_tot=sum((Data-mean(Data)).^2);
Rsq=1-SS_err./SS_tot;
I have four sets of data and three sets of models, so Rsq ends up being a 3x4 matrix. Method #1 gives
0.9408 0.9846 0.9370 0.9408
0.9381 0.9840 0.9383 0.9381
0.9431 0.9908 0.8999 0.9431
Method #2 gives
0.5972 -0.1735 0.8220 0.4431
0.1777 0.2551 0.7937 0.8672
-2.9059 0.9313 -3.5218 0.7585
Obviously, method #1 makes my model look amazing, but I don't think it's *that* good, and I'm not familiar with computing R^2 using that method, so I'm not even sure if it's really appropriate. Is anyone more familiar with this and can tell me which way is more appropriate, or if a completely different method might be better?
edit: Looking at the results a little closer, it seems that Method #1 results are higher when the shape of the model curve is more similar to the data, but doesn't account for offsets/differences in amplitude of the changes. Method #2 appears to be very sensitive to offsets/differences in amplitude. It's almost like #1 is giving a qualitative comparison of the trends, while #2 is giving a comparison of the quantitative results, though maybe I'm reading too much into it.
