regression based on binary attributes

walla

Senior member
Jun 2, 2001
987
0
0
Say I have some test data. 18 attributes are binary variables. I want to use those binary variables to predict a continuous value.

Is there a type of regression that specializes in this type of analysis?

Thanks.
 

CycloWizard

Lifer
Sep 10, 2001
12,348
1
81
Just to be clear, you have eighteen variables with two settings and you want to be able to predict the value of the system? You'd need quite a bit of data to get any good regression, but analysis of variance should be able to give you what you want, and it's pretty simple.
 

walla

Senior member
Jun 2, 2001
987
0
0
Originally posted by: CycloWizard
Just to be clear, you have eighteen variables with two settings and you want to be able to predict the value of the system? You'd need quite a bit of data to get any good regression, but analysis of variance should be able to give you what you want, and it's pretty simple.

Yeah - 18 variables with two settings. So there are 256k unique states. I am testing from a sample of 600 of them, picked at random.

I am not familiar with what you mean by "analysis of variance". Could you explain this or point me in the right direction (as in name of specific techniques)? I'm relatively inexperienced in statistical analysis. Thanks :)
 

CycloWizard

Lifer
Sep 10, 2001
12,348
1
81
Analysis of variance is commonly referred to as ANOVA. I believe Excel has tools that will do it for you, though I typically use statistics packages. It can also be done by hand, but I wouldn't recommend it for the number of variables and data points you're looking at. :p Let me see what I can figure out real quick and I'll post my findings.

edit: It looks like Excel can only consider two factors (two variables) for its ANOVA. I did, however, find this handy-dandy web site that will probably work for your application. It looks like you can just cut and paste your data into it after you tell it how many groups you have. Let me know if you need guidance interpreting the results or figuring out how to use the web site.

http://www.physics.csbsju.edu/stats/anova.html
 

walla

Senior member
Jun 2, 2001
987
0
0
Originally posted by: CycloWizard
Analysis of variance is commonly referred to as ANOVA. I believe Excel has tools that will do it for you, though I typically use statistics packages. It can also be done by hand, but I wouldn't recommend it for the number of variables and data points you're looking at. :p Let me see what I can figure out real quick and I'll post my findings.

edit: It looks like Excel can only consider two factors (two variables) for its ANOVA. I did, however, find this handy-dandy web site that will probably work for your application. It looks like you can just cut and paste your data into it after you tell it how many groups you have. Let me know if you need guidance interpreting the results or figuring out how to use the web site.

http://www.physics.csbsju.edu/stats/anova.html



Thanks for this reply.

I actually have access to some nice stat software (SPSS) that will do ANOVA. However, there is much terminology I don't understand. I think I can ask some experts in the area to help me with the software.

But...the idea behind ANOVA...what am I learning? And is it anything a layman could interpret or will I be needing to read some text books? Will this give me an indication of the model i need to use for numeric prediction?

Thanks again.
 

CycloWizard

Lifer
Sep 10, 2001
12,348
1
81
Originally posted by: walla
But...the idea behind ANOVA...what am I learning? And is it anything a layman could interpret or will I be needing to read some text books? Will this give me an indication of the model i need to use for numeric prediction?

Thanks again.
ANOVA determines which factors (variables) are actually affecting the measured output of the system. It then creates a correlation for the relationship between the system output and the variable. For linear ANOVA, this is like y=m*x+b, where y is the system output, x is the variable value, m is the magnitude of the effect of that variable on the system output (or the slope of the linear correlation), and b is the mean system output. For multiple variables, you'll have y=A*a+B*b+C*C+...+(mean output) where the capital letters represent the magnitude of the effect of each variable and the lower case letters represent the variable values.

Basically, it performs multiple linear regressions by isolating the effects of each variable on system performance. The 'mean value' is the value of the system when all variables are set to zero (where zero is not the same as your binary zero... in ANOVA, you would set 0=-1 and 1=1 to represent high and low values of each variable). Basically, the 'mean value' is the system output when all process variables are set to their mean values. Hopefully that's not too confusing. :p
 

Jonitus

Member
Feb 14, 2002
109
0
0
Binary independant variables and a continuous dependant variable? Sounds like the ANOVA will work, but you could also use a modification of a binary logistic regression, called a multinomial logistic regression. SPSS has that capability. I'm not sure that using the multinomial logistic regression would result in a particularly parsimonious model, but you never know.

I'd try the ANOVA first...sounds like the ticket.