comp.soft-sys.sas - The SAS statistics package.
Hi, Sorry to bring up a very simple question. I have a variable y and other independent variables x1, x2, ..., xn Now I need to applyy multiple linear regression model y = a1*x1+a2*x2 + ... + an*xn to forecast future values of y. Can anybody tell me in SAS what procedure I should use? Thanks a lot. Fred
Youssef Maousi asked > I have an binary depedent variable Y(0,1) and three independent > variables X (1,2) and Z(1,2,3) and T(1,2,3,4) > I know that it is stuitable to consider a logistic model for Y and X,Z,T. > > But I would like to use linear regression model for some purposes, > could you please give me any suggestions to do this in SAS, how can > I include dummy variables and how can I choose my reference for the > X,Z and T, and also how to include include interraction terms. He got some very good advice from David Cassell (David always gives good advice, in my experience) <<< No matter how you recode your dummy variables, you cannot get around the fact that your dependent variable Y is a 0/1 variable. You can NEVER get the crucial underlying assumptions for simple linear regression. And without those assumptions, you cannot get valid results! So don't do this. Why can't you use logistic regression? Write back to SAS-L (not to me personally) and tell us why you want simple linear regression (well, with just dummy variables it's an analysis of variance model) and why you can't use PROC LOGISTIC to get what you want. >>>> Then Youssef replied <<< If I would like to use the linear probability models using the proportion of Y= 1 with denominator the to total over each set of distinct categories of my independent variables the number of people with Y=1 and X=1,Z=1,T= 1 over the total number of people X=1,Z=1,T= 1 and so on, this give the proportion to be used as a depedent variable? This will do a better job than if I use a simple linear model with Y(0,1)? >>> First, you have not answered David's questions. Why do you want to use linear regression? Why do you not want to use logistic regression? It's sort of rude to ask for advice and then ignore it. Second, what is it you are trying to do? What are these variables? What's your question? Third, no, using proportions as a DV in linear regression is not appropriate. Whether it's 'better' than using 0,1 responses is not really the point. It's like asking whether it's better to use a hammer or a crowbar to screw things in. What you want is a screwdriver. The screwdriver here is logistic regression - at least, that's the screwdriver based on what you've said. There are at least two reasons you ought not use proportions as a DV in linear regrssion 1) proportions are bounded by 0 and 1 - linear regression assumes that the DV goes from negative infinity to positive infinity. SOMETIMES this doesn't need to be literally true, but this isn't one of those times. 2) Linear regression assumes homoscedasticity. Proportions as DVs don't have this. Peter Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax)