• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Statistics ppl or ppl in the know.. i need some help -- NOW With EXCEL file.

Oct 9, 1999
15,216
3
81
So apparently I got way over my head and now i find my statistical project seems easier than it should be.

I am trying to see if there is decrease in air quality as there is a rise in temperature.

My data is for the year 2005, collected by the state of california and NWS. I am collecting air quality (Particulate matter (PM) and ozone) but i am using whatever is higher. So basically i am collecting over all air quality marker and not just one particular thing cause air quality is based on what is higher.. which ever is higher sets the air quality index.

So anyway after 365 peices of data of each type. I did a correlation which resulted in a value of 0.264 .

Doing an XY Scatter plot one can see it looks like a ^ in the graph.. which in reality does show some correlation as there are greater number of particulate matter stuff in teh summer than there are in teh winter.

But regression shows a 'cloud' .. so i am kinda lost on what else i should do?

I guess i could do mean, median, mode, and SD? or should i not?

Any help would be so helpful..

I can upload my data somewhere.. also how does one interpret regression right? i dont want to fsck it up.

Here is teh file:

Excel File
 
Jun 27, 2005
19,216
1
61
I'm not a statistics person but I am a person in the know. My solution: I suggest you stay at a Holiday Inn.
 
Oct 9, 1999
15,216
3
81
Originally posted by: UncleWai
Correlation coefficient?

yup.. that came to 0.264 .. which looks like a cloud in regression.

what else can i say about my study.. its a pain in the butt class and the prof really didnt teach regression .. but i have to do it cause there is no other way to show the display.. and i dont know how to interpret it.. i have no idea what the R Square value is.. i am trying to read it up on wikipedia and other sites.
 

novasatori

Diamond Member
Feb 27, 2003
3,851
1
0
iirc the r square is relative to how accurate the regression line fits the data
0 being least 1 being perfect fit
 

hypn0tik

Diamond Member
Jul 5, 2005
5,866
2
0
Originally posted by: novasatori
iirc the r square is relative to how accurate the regression line fits the data

Yes. The closer the r^2 value is to unity, the better the fit.

Regression is a approximation of your data to a line, quadratic, etc... depending on the regression model you use. The most common type is linear regression where you try to fit your points to a line. Depending on how good your regression is, it can be useful in approximating future values with current data.

There are a bunch of formulas that you have to go through to do a regression manually, but that's a waste of time since Excel will do it for you automatically. Right click on your data set and 'add trendline'.
 

UncleWai

Diamond Member
Oct 23, 2001
5,701
68
91
Oh, I thought that was beta.
If you can see an arch shape in your data set, I don't know if linear regression is even a good model?
 
Oct 9, 1999
15,216
3
81
Originally posted by: novasatori
iirc the r square is relative to how accurate the regression line fits the data
0 being least 1 being perfect fit

okay then

Multiple R 0.264504731
R Square 0.069962753
Adjusted R Square 0.067400667
Standard Error 14.59171819
Observations 365

so its not a perfect fit.. (Multiple R = correlation coefficient)
R Square = doesnt fit .. loose fit
I am going to put that info into my presentation.. how do i talk about SE in this case?
 

hypn0tik

Diamond Member
Jul 5, 2005
5,866
2
0
Originally posted by: TheGoodGuy
Originally posted by: novasatori
iirc the r square is relative to how accurate the regression line fits the data
0 being least 1 being perfect fit

okay then

Multiple R 0.264504731
R Square 0.069962753
Adjusted R Square 0.067400667
Standard Error 14.59171819
Observations 365

so its not a perfect fit.. (Multiple R = correlation coefficient)
R Square = doesnt fit .. loose fit
I am going to put that info into my presentation.. how do i talk about SE in this case?

Those are terrible r^2 values. They are pretty much meaningless. Try a higher order regression to get a better fit.
 

hypn0tik

Diamond Member
Jul 5, 2005
5,866
2
0
Originally posted by: TheGoodGuy

Here is the file that i just saved on my comp.. have a look and let me know if i did things right?

Well, the problem is that you used a linear regression to approximate something that's clearly non-linear.

Also, I don't see any formulas in your cells. Where are you getting those numbers from?

Edit: Ah, maybe you imported the data from somewhere else?
 

novasatori

Diamond Member
Feb 27, 2003
3,851
1
0
you can do this right on the chart, click the data plots, right click and click add trend line, change it to a polynomial, and select the power you want to use

6 will give you the most accurate r2 value which is:

y = -1E-12x6 + 2E-09x5 - 9E-07x4 + 0.0002x3 - 0.0246x2 + 1.3301x + 37.082
R2 = 0.7968

xls with trendline

you need to play with it I guess and figure out which is most appropriate for future/past trends or whatnot, i have no clue as to the forecast of your data.

just click the trend line and click configure or whatever.
 
Oct 9, 1999
15,216
3
81
Originally posted by: novasatori
you can do this right on the chart, click the data plots, right click and click add trend line, change it to a polynomial, and select the power you want to use

6 will give you the most accurate r2 value which is:

y = -1E-12x6 + 2E-09x5 - 9E-07x4 + 0.0002x3 - 0.0246x2 + 1.3301x + 37.082
R2 = 0.7968

xls with trendline

you need to play with it I guess and figure out which is most appropriate for future/past trends or whatnot, i have no clue as to the forecast of your data.

just click the trend line and click configure or whatever.

I am lost not sure how you did that one.. but that's just showing the XY scatter plot.. with a trendline.. has nothign to do with teh actual regression stuff.

My regression stuff is on one of the work sheets with the graph in there on the bottom of the regression page.
 

novasatori

Diamond Member
Feb 27, 2003
3,851
1
0
oh i see i did the wrong chart did I?

that first chart seems to be random stuff with no real regression
 

novasatori

Diamond Member
Feb 27, 2003
3,851
1
0
i wish i could help more, but i'm no statistics expert

from the graph you got I would say that no regression will give you a reasonable fit and that the two variables are mutually exclusive

I'm not sure that you have it graphed right though...
though your linear regression fits perfectly to the excel one on your graph the graph itself is all willy nilly, and I think that is wrong, its like you're graphing two dependent variables

maybe you want day on the x axis??

i have to get some sleep though so hopefully someone who is better can help you - although i would have loved to get more info from you and possible help you out

this is all i did

trendaqt2
 

AtlantaBob

Golden Member
Jun 16, 2004
1,034
0
0
I'm far from a stats pro, but it's hard to justify deciding on a higher-order fit without some kind of theory behind it. As you get higher and higher order polynomials, they will have more local maximums and minimums (i.e. more flexibility to cover all of your data), which will result in a high R^2 value. However, they don't necessarily mean anything, and would likely not be a decent model of other related data (say, the same data in the state of Arizona)
 
Oct 9, 1999
15,216
3
81
i just realised something... i have air quality on x and temprature on y..

but isnt my temp the independent variable.. well technically they are both independent in real life.