sas >> BOXPLOT - OUTLIERS

by zaoz1 » Wed, 26 May 2004 23:42:03 GMT

Hi everybody!

Someone know how to exclude the outliers on the graph of the proc boxplot?!?!?
Thanks!!!!!!!!!!!!!

sas >> BOXPLOT - OUTLIERS

by stringplayer_2 » Thu, 27 May 2004 00:49:58 GMT


There are a whole host of issues here, but I think we can boil
things down to a fundamental question: "Why do you want to
exclude the outliers?" Anticipating that you have a long-tailed
distribution for which there are a few "outliers" way out in
space, I would also ask if you have considered a variable
transformation (e.g., log(x)). A suitable variable transformation
can often bring what appear to be very high outliers within the
whisker region which has lower boundary 25th percentile - 1.5*IQR
and upper boundary 75th percentile + 1.5*IQR (where the IQR is
the interquartile range obtained as the value at the 75th
percentile - value at the 25th percentile).

If a variable transformation does not bring in the outliers and
there really is good reason to exclude the outliers, then please
tell us what code you are using to construct the boxplots. Are
you using PROC BOXPLOT? But remember to tell us why you believe
that you need to exclude the outliers from your boxplots.

Dale





=====
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: XXXX@XXXXX.COM
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------




__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

sas >> BOXPLOT - OUTLIERS

by zaoz1 » Thu, 27 May 2004 16:40:55 GMT


Hi Dale!
Thanks for your help!!!
Yes I need to exclude the outliers of my analysis because I work on
roughly 1 200 000 data and so there are nearly 50 data for each
boxplots which disturb the boxplots graphs so they are unreadable!!!!

proc boxplot data=toto gout=sortie;
plot &&var&j*eurofil
/caxis=black
cframe=cxffffff
ctext=black
cboxes=cx153e7e
boxstyle=SCHEMATIC
idcolor=blue
idsymbol=circle
boxconnect
cconnect=red;
run;

sas >> BOXPLOT - OUTLIERS

by stringplayer_2 » Sat, 29 May 2004 03:09:25 GMT

am,

It appears that the BOXPLOT procedure has no capability for
restricting the response variable axis without actually throwing
away response variable values by employing a where clause. That,
of course, is no good because throwing away response values
changes all statistics. However, all is not lost! Just switch
over to Michael Friendly's BOXPLOT macro. With the BOXPLOT macro,
you can specify the graphical range of the response variable
without restricting the values of the response variable to the
graphical range. The mean, median, quartiles, and whiskers are
all properly presented as long as your graphical range does not
cut off any of these values. You can get Michael Friendly's
BOXPLOT macro at http://www.math.yorku.ca/SCS/sssg/boxplot.html.

The code below demonstrates how one can limit the graphical range
and still get an appropriate box and whisker presentation.


/* construct data which have "outliers" */
data test;
do class=1 to 5;
do i=1 to 75;
x = rannor(1234579);
y = exp(x);
output;
end;
end;
run;

title "Unrestricted range boxplots with outliers";
title2 "BOXPLOT procedure";
proc boxplot data=test;
plot y*class / boxstyle=schematic;
run;

title2 "BOXPLOT macro";
%boxplot(data=test,
class=class,
var=y,
yorder=0 to 20 by 5)

title "Restricted graphical range boxplots: remove outliers above 10";
%boxplot(data=test,
class=class,
var=y,
yorder=0 to 10)

title "Restricted graphical range boxplots: remove outliers above 6";
%boxplot(data=test,
class=class,
var=y,
yorder=0 to 6)

title "Restricted value range boxplots: remove values above 6";
title2 "BOXPLOT procedure";
proc boxplot data=test(where=(y<=6));
plot y*class / boxstyle=schematic;
run;

data test2;
set test(where=(y<=6));
run;

title2 "BOXPLOT macro";
%boxplot(data=test2,
class=class,
var=y,
yorder=0 to 6)


You will observe that as we restrict the graphical range for the
boxplots using the BOXPLOT macro, we do not change the presentation
of any of the statistics. We only eliminate outliers from the
graphical presentation. However, when we restrict the values of
the response variable, we do change the presentation of all
statistics.

Some features of your boxplots (connected mean values) are not
available with the BOXPLOT macro. You can request that the
median values be connected. I am sure that it would not be
difficult to reconstruct the code so as to connect the means.
If you need to do that, then you'll have to work on that yourself.

HTH,

Dale


--- Dam < XXXX@XXXXX.COM > wrote:


=====
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: XXXX@XXXXX.COM
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------




__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

Similar Threads

1. best way to remove outliers

2. Outliers and Proc univariate

I would like cap extreme observations through proc univariate as follows:
proc univariate data=c.CashNonCash01PaySample noprint;
weight wt;
var Transc_currentbalanceNum ;
output out=c.BalOutlier std=balstd pctlpts=99 pctlpres=bal;
run;
data c.CashNonCash01PaySamplea;
set c.CashNonCash01PaySample;
if (_n_ eq 1) then set c.BalOutlier(keep=balstd bal99);
if balstd>2*bal99 then Transc_currentbalanceNum2=min(Transc_currentbalanceNum, (4*bal99));
else Transc_currentbalanceNum2=Transc_currentbalanceNum;
run;
Now for numeric variables with only positive values, the above code works ok. I would like to do the same for numeric variables with both positive and negative values. How can this be accomplished within proc univariate?
Thanks,
Doyle.

3. Q: Outliers in regression analysis : Big problem?

4. Missing Values and Outliers

Missing Values issue:

So far, there are no guidelines for how much missing data can be
tolerated for a
sample of any given size (Tabachnick & Fidell, 1996).
Is it correct to analyse the potentially meaningful influences of
missing values
only when the amount of missings exceeded 7%???

I have seen the scales (item by item) in my study and the % of missings
not exceeds 1,3 %.



Outliers issue:

I have notice several techniques to scan for outliers (steam-and-leaf;
zscores,
etc...
which strategy  shoulf i use? Step-by-step.

5. How to exclude outliers after importing data

6. Filter outliers in Enterprise Miner

Hi,

In Filter node, when I try to specify the upper and lower limits for a
user-specified filtering method - the values disappear as soon as I
close the Variables-Filter pop-up box!
The same happens with the lower and upper limits (variables) in the
Input Data node too.
I am not able to restrict the values of any variable for analysis. Any
ideas?

Thanks,
Shalini.

7. chi square analysis to identify the outliers

8. tip: finding outliers using PROC GLM