spss >> Testing for homoscedasticity, linearity and normality for multiple linear regression using SPSS v12

by tHatDudeUK » Sun, 17 Apr 2005 22:51:27 GMT

Hi,

My sample size is 149. I have one dependent variable and 10
independent (or predictor) variables which I'm analysing using
multiple linear regression (with the enter method).

There are some assumptions of multiple regression I'm not sure how to
test for, including normality, homoscedasticity and linearity. I have
already considered collinearity (that's the easy one). Not sure if
I've missed any out. I've been told there's an easy graphical way to
do this but I'm not sure how to do this. I think it might be
prefereable to use a statistical test where possible though.

If you can provide any pointers to where I should be looking I'd be
very grateful.

Thanks in advance,

Regards

Alan.


spss >> Testing for homoscedasticity, linearity and normality for multiple linear regression using SPSS v12

by Reef Fish » Mon, 18 Apr 2005 00:39:42 GMT





Do a normal probability plot of the residuals.


Do a scatter plot of the residuals vs the FITTED dependent variable.

These two are assumptions about the ERRORS, being iid N(0, sigma-sq.),
so the third component of the assumption is INDEPENDENCE of the errors.

Here you need to do some sequence plots of the residuals vs fitted
values (possibly other variables as well).



This is the assumption about the deterministic part of the model.
The linearity is about the regression COEFFICIENTS in the linear
model, or whether the data fits the postulated hyperplane.

A recent thread here discussed various (graphical) methods to examine
this model assumption.



How did you do it? It's not necessarily an easy one unless it's so
severe that the programs bombs by trying to invert a near-singular
matrix. :-)



An effective graphical method is ALWAYS preferable to a statistical
test!



Any GOOD textbook on APPLIED Regression Analysis should treat ALL of
the above (and more) GRAPHICAL methods, each aimed at detecting
specific departure of your data from the functional (linear) and
probability (iid N(0, sigma-sq)) assumptions.


Nada.

-- Bob.




spss >> Testing for homoscedasticity, linearity and normality for multiple linear regression using SPSS v12

by tHatDudeUK » Mon, 18 Apr 2005 01:34:16 GMT

On 17 Apr 2005 09:39:42 -0700, "Reef Fish"


Hmmm I read a book which suggested multicollinearity occues when
collinearity tolerance values are less than or equal to 0.1. All mine
were above 0.4. This is probably too simplistic a method I assume?


Went to the library but Tabachnik and Fidell was missing :-( and it's
expensive on amazon :-(


Testing for homoscedasticity, linearity and normality for multiple linear regression using SPSS v12

by Reef Fish » Mon, 18 Apr 2005 02:27:38 GMT





You guessed right.

I assumed your "collinearity tolerance values" has something to do
with the statistical significance of the INDIVIDUAL coefficients.

In a multivariable multicollinearity situation, you may have
X1 = aX2 + bX3 + cX4 + 10^(-10) <or however small the in-exact
linear relation between the independent variables hold, and for
NONE of the pairwise correlation between (Xi and Xj) to be high
as to be detected by the packaged software to warn the users.



Did they take a vacation to Cuba? :-) Surely there are better books
in APPLIED regression analysis than Tabachnik and Fidell. Besides, I
don't think you're up to the task of reading a book (even a good one)
on regression and multicollinearity to understand all the theory and
nuances behind the PRACTICE of how to do a multiple linear regression
well. JMHO. You need the watchful eye of someone who is experienced
in the problem for guidance and help.

-- Bob.



Testing for homoscedasticity, linearity and normality for multiple linear regression using SPSS v12

by Richard Ulrich » Mon, 18 Apr 2005 03:57:14 GMT

On Sun, 17 Apr 2005 15:51:27 +0100, tHatDudeUK



Stepwise regression is seldom a good idea. None of the tests
are good. For a typical problem, the wrong variables are chosen.
Where is it good? -- When you know there will be a prediction
from any set of the variables, and you want a short equation.

You can see my stat-FAQ for Posted comments on the subject,
which I collected a few years ago. You can google groups
in < sci.stats.* > for additional comments on Stepwise, and
for references.



Going in, the important thing to consider is that there are
no extreme outliers -- or gaps in the distributions.
That matters for the predictors and for the outcome.
Are there any "basement" or "ceiling" effects on scales?

The content of the variables and how they arose, etc., is
usually a pretty good guide to the main aspects of linearity.

Those assumptions matter for the purpose of a *test* rather
than the purpose of *doing* a linear regression; you can do
a regression with anything you have (barring extreme multi-
collinearity).

...

--
Rich Ulrich, XXXX@XXXXX.COM
http://www.pitt.edu/ ~wpilib/index.html


Testing for homoscedasticity, linearity and normality for multiple linear regression using SPSS v12

by Reef Fish » Mon, 18 Apr 2005 04:54:41 GMT





And stepping backwards is almost always better than stepping forward,
for the obvious that X1 and X2 may fit Y perfectly as a pair, but
neither will do as well by itself as X3.



This comment needs to be further clarified.

An extreme outlier can be in the (X-y) space or in the space of
residuals (observed errors).

In the older books, papers, and older software packages, undue
emphases had often beeb placed on the observed error space only.
But some of those "outliers" near the center of the X-space may
have as little as ZERO influence on the fitted model (if it
coincides with X-bar).

Conversely, there may be points far from the center of X that
exerts tremendous influence on the fitted model but would not
exhibit itself as an "outlier" in the residual space.


Jerry Dallal gave a simple expository explanation of these concepts
of "leverage points" and "influential points" in the link:

http://www.tufts.edu/ ~gdallal/diagnose.htm


The new subject area has been known as "regression diagnostics",

much more than just the analysis of "outliers", e.g.
http://www.google.com/search?hl=en&q=textbook +on+regression+diagnostics

Cook and Weisberg (1982) is another standard reference text:
http://www.math.montana.edu/Rweb/Rhelp/influence.measures.html

Chatterjee, Hadi, and Price (1984?) is wtill another:
http://www.ats.ucla.edu/stat/examples/chp/default.htm

See also Chatterjee, S., and A. S. Hadi, "Influential Observations,
High Leverage Points, and Outliers in Linear Regression," Statistical
Science, 1:379-416, 1986.

and many other references in post-1980 literature.

-- Bob.



Testing for homoscedasticity, linearity and normality for multiple linear regression using SPSS v12

by Bruce Weaver » Mon, 18 Apr 2005 22:53:16 GMT






I don't think the OP was doing stepwise, were they? The "enter" method
(in SPSS) means that the OP chose which variables to include, and forced
them into the model.

--
Bruce Weaver
XXXX@XXXXX.COM
www.angelfire.com/wv/bwhomedir


Testing for homoscedasticity, linearity and normality for multiple linear regression using SPSS v12

by Thom » Tue, 19 Apr 2005 20:28:04 GMT

>I assumed your "collinearity tolerance values" has something to do

No, tolerance just refers to the degree to which predictors are
independent of other predictors. If tolerance is high (close to 1)
there is little evidence of collinearity. tolerance close to zero is to
be avoided. Values in between reflect degrees of collinearity. A sharp
cut-off for tolerance should be avoided as collinearity is not an all
or nothing thing.

Thom



Testing for homoscedasticity, linearity and normality for multiple linear regression using SPSS v12

by Reef Fish » Tue, 19 Apr 2005 21:05:27 GMT



to
sharp

Thanks for the clarification.

Can you elaborate on HOW this tolerance is defined and/or calculated.
in a multiple regression of Y on 10 independent variables Xs, say?

-- Bob.



Testing for homoscedasticity, linearity and normality for multiple linear regression using SPSS v12

by Ray Koopman » Wed, 20 Apr 2005 16:09:11 GMT




In a multiple regression context, the "tolerance" of a predictor =
1 - (its squared multiple correlation with the other predictors).

I think this usage originated in the 60s, in a program (BMD?) that
asked the user to specify how small a pivot value could get before
being considered to be zero.



Testing for homoscedasticity, linearity and normality for multiple linear regression using SPSS v12

by Bruce Weaver » Wed, 20 Apr 2005 19:18:44 GMT




calculated.

And variance inflation factor (VIF) is 1/tolerance. I tried to post
this link yesterday, but my news server was not cooperating.

http://www2.chass.ncsu.edu/garson/pa765/regress.htm #toleranc

--
Bruce Weaver
XXXX@XXXXX.COM
www.angelfire.com/wv/bwhomedir