spss >> Studentized Deleted Residual

by Katsche.Schwarzenbeck » Sat, 23 Jun 2007 03:44:36 GMT

Hi,

I am trying to determine outliers on one variable (e.g. self reported
money spent on gas). What I want to do is calculate the "studentized
deleted residual", i.e. a z-score that does *not* take the i-th value
into account for calculating the standard deviation and mean.

Technically, this is r-i = ((Y-i - Y-hat)/s-hat)
Where Y-hat is the mean of all datapoints *excluding* i, and s is the
standard deviation of all datapoints excluding i.

There are ways to get the studentized deleted residuals for a
regression analysis, but I don't have a dependent or independent
variable, I am just looking of a frequency count of one variable.

Does anybody know if there is an easy way to do this in SPSS? It would
be fairly straightforward in MS Excel, but I would like to avoid going
back and forth between the programs.

Thanks

Katsche



spss >> Studentized Deleted Residual

by JKPeck » Mon, 25 Jun 2007 20:44:18 GMT






Just regress on a constant and save the residuals you want.

HTH,
Jon Peck




spss >> Studentized Deleted Residual

by Bruce Weaver » Mon, 25 Jun 2007 21:41:22 GMT





Jon, can you provide an example? When I try, the model doesn't run, and
I get messages such as:

Warnings
|---------------|
|For models with|
|dependent |
|variable Age of|
|Respondent, the|
|following |
|variables are |
|constants or |
|have missing |
|correlations: |
|c1. They will |
|be deleted from|
|the analysis. |
|---------------|
|For models with|
|dependent |
|variable Age of|
|Respondent, |
|fewer than 2 |
|variables |
|remain. |
|Statistics |
|cannot be |
|computed. |
|---------------|


Thanks,
Bruce

--
Bruce Weaver
XXXX@XXXXX.COM
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."


Studentized Deleted Residual

by Ray Koopman » Tue, 26 Jun 2007 13:40:54 GMT




If m is the mean of all n cases, x is the value of case i,
and m' is the mean of the n-1 cases other than i, then
m-m' = (x-m')/n = (x-m)/(n-1).

If SS is the sum of squared deviations of all n cases
from their mean m, and SS' is the sum of squared deviations
of the n-1 cases other than i from their mean m', then
SS-SS' = (n/(n-1))*(x-m)^2.

The studentized deleted residual for case i is

t = (x-m')/sqrt[(1 + 1/(n-1))*SS'/(n-2)]

= z * sqrt[(n-2)/((n-1)^2/n - z^2)],

where z = (x-m)/sqrt[SS/(n-1)]

is the ordinary standard score for case i.

If all n cases are i.i.d. samples from a normal population
then t should be distributed as Student's t with df = n-2.



Studentized Deleted Residual

by Katsche.Schwarzenbeck » Wed, 27 Jun 2007 21:57:31 GMT






Yes, that's the problem: you would have to regress on a dummy that is
0 for one case a constant for all others. That you would need to do n
times (you cannot regress on a constant since the variance of the
constant is zero).

I guess technically, I could write a macro that creates n new dummy
variables, then run n regressions and take the residual for each of
the regressions, but I was wondering if there is a more straightforwad
way.



Studentized Deleted Residual

by JKPeck » Thu, 28 Jun 2007 01:13:47 GMT








Here is an example with no tricks.

temporary.
compute c = 1.
REGRESSION
/ORIGIN
/DEPENDENT accel
/METHOD=ENTER c
/SAVE SDRESID .

HTH,
Jon Peck



Studentized Deleted Residual

by Katsche.Schwarzenbeck » Thu, 28 Jun 2007 03:35:02 GMT

n Jun 27, 1:13 pm, JKPeck < XXXX@XXXXX.COM > wrote:

Indeed, it helps. Thanks a bunch

Katsche



Studentized Deleted Residual

by Bruce Weaver » Fri, 29 Jun 2007 22:17:33 GMT

KPeck wrote:

Thanks Jon. I did not save my earlier syntax, and don't remember what I
was doing differently.

--
Bruce Weaver
XXXX@XXXXX.COM
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."


Studentized Deleted Residual

by Richard Ulrich » Sat, 30 Jun 2007 07:21:13 GMT

On Fri, 29 Jun 2007 10:17:33 -0400, Bruce Weaver



[snip.]

probably
"/origin"
which saves that degree of freedom.

--
Rich Ulrich, XXXX@XXXXX.COM
http://www.pitt.edu/ ~wpilib/index.html


Studentized Deleted Residual

by Bruce Weaver » Tue, 03 Jul 2007 22:32:47 GMT







Well spotted, Rich. If I remove "/origin", I get my error messages
back. I'm not sure what you mean about saving the degrees of freedom
though. With "/origin", you get regression through the origin (i.e.,
constant = 0). Where do degrees of freedom come into it?

Cheers,
Bruce


--
Bruce Weaver
XXXX@XXXXX.COM
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."


Studentized Deleted Residual

by Richard Ulrich » Fri, 06 Jul 2007 11:08:02 GMT

On Tue, 03 Jul 2007 10:32:47 -0400, Bruce Weaver






Okay, there isn't any big generality about degrees of freedom.
Here is the brief point which I had in mind.

That particular degree of freedom is the "variation around the mean".
Jon's OLS problem has a sum of squares that is not zero, which is
probably what matters.

--
Rich Ulrich, XXXX@XXXXX.COM
http://www.pitt.edu/ ~wpilib/index.html