Hi everybody!

Someone know how to exclude the outliers on the graph of the proc boxplot?!?!?

Thanks!!!!!!!!!!!!!

Hi everybody!

Someone know how to exclude the outliers on the graph of the proc boxplot?!?!?

Thanks!!!!!!!!!!!!!

Someone know how to exclude the outliers on the graph of the proc boxplot?!?!?

Thanks!!!!!!!!!!!!!

There are a whole host of issues here, but I think we can boil

things down to a fundamental question: "Why do you want to

exclude the outliers?" Anticipating that you have a long-tailed

distribution for which there are a few "outliers" way out in

space, I would also ask if you have considered a variable

transformation (e.g., log(x)). A suitable variable transformation

can often bring what appear to be very high outliers within the

whisker region which has lower boundary 25th percentile - 1.5*IQR

and upper boundary 75th percentile + 1.5*IQR (where the IQR is

the interquartile range obtained as the value at the 75th

percentile - value at the 25th percentile).

If a variable transformation does not bring in the outliers and

there really is good reason to exclude the outliers, then please

tell us what code you are using to construct the boxplots. Are

you using PROC BOXPLOT? But remember to tell us why you believe

that you need to exclude the outliers from your boxplots.

Dale

=====

---------------------------------------

Dale McLerran

Fred Hutchinson Cancer Research Center

mailto: XXXX@XXXXX.COM

Ph: (206) 667-2926

Fax: (206) 667-5977

---------------------------------------

__________________________________

Do you Yahoo!?

Friends. Fun. Try the all-new Yahoo! Messenger.

http://messenger.yahoo.com/

Hi Dale!

Thanks for your help!!!

Yes I need to exclude the outliers of my analysis because I work on

roughly 1 200 000 data and so there are nearly 50 data for each

boxplots which disturb the boxplots graphs so they are unreadable!!!!

proc boxplot data=toto gout=sortie;

plot &&var&j*eurofil

/caxis=black

cframe=cxffffff

ctext=black

cboxes=cx153e7e

boxstyle=SCHEMATIC

idcolor=blue

idsymbol=circle

boxconnect

cconnect=red;

run;

am,

It appears that the BOXPLOT procedure has no capability for

restricting the response variable axis without actually throwing

away response variable values by employing a where clause. That,

of course, is no good because throwing away response values

changes all statistics. However, all is not lost! Just switch

over to Michael Friendly's BOXPLOT macro. With the BOXPLOT macro,

you can specify the graphical range of the response variable

without restricting the values of the response variable to the

graphical range. The mean, median, quartiles, and whiskers are

all properly presented as long as your graphical range does not

cut off any of these values. You can get Michael Friendly's

BOXPLOT macro at http://www.math.yorku.ca/SCS/sssg/boxplot.html.

The code below demonstrates how one can limit the graphical range

and still get an appropriate box and whisker presentation.

/* construct data which have "outliers" */

data test;

do class=1 to 5;

do i=1 to 75;

x = rannor(1234579);

y = exp(x);

output;

end;

end;

run;

title "Unrestricted range boxplots with outliers";

title2 "BOXPLOT procedure";

proc boxplot data=test;

plot y*class / boxstyle=schematic;

run;

title2 "BOXPLOT macro";

%boxplot(data=test,

class=class,

var=y,

yorder=0 to 20 by 5)

title "Restricted graphical range boxplots: remove outliers above 10";

%boxplot(data=test,

class=class,

var=y,

yorder=0 to 10)

title "Restricted graphical range boxplots: remove outliers above 6";

%boxplot(data=test,

class=class,

var=y,

yorder=0 to 6)

title "Restricted value range boxplots: remove values above 6";

title2 "BOXPLOT procedure";

proc boxplot data=test(where=(y<=6));

plot y*class / boxstyle=schematic;

run;

data test2;

set test(where=(y<=6));

run;

title2 "BOXPLOT macro";

%boxplot(data=test2,

class=class,

var=y,

yorder=0 to 6)

You will observe that as we restrict the graphical range for the

boxplots using the BOXPLOT macro, we do not change the presentation

of any of the statistics. We only eliminate outliers from the

graphical presentation. However, when we restrict the values of

the response variable, we do change the presentation of all

statistics.

Some features of your boxplots (connected mean values) are not

available with the BOXPLOT macro. You can request that the

median values be connected. I am sure that it would not be

difficult to reconstruct the code so as to connect the means.

If you need to do that, then you'll have to work on that yourself.

HTH,

Dale

--- Dam < XXXX@XXXXX.COM > wrote:

=====

---------------------------------------

Dale McLerran

Fred Hutchinson Cancer Research Center

mailto: XXXX@XXXXX.COM

Ph: (206) 667-2926

Fax: (206) 667-5977

---------------------------------------

__________________________________

Do you Yahoo!?

Friends. Fun. Try the all-new Yahoo! Messenger.

http://messenger.yahoo.com/

It appears that the BOXPLOT procedure has no capability for

restricting the response variable axis without actually throwing

away response variable values by employing a where clause. That,

of course, is no good because throwing away response values

changes all statistics. However, all is not lost! Just switch

over to Michael Friendly's BOXPLOT macro. With the BOXPLOT macro,

you can specify the graphical range of the response variable

without restricting the values of the response variable to the

graphical range. The mean, median, quartiles, and whiskers are

all properly presented as long as your graphical range does not

cut off any of these values. You can get Michael Friendly's

BOXPLOT macro at http://www.math.yorku.ca/SCS/sssg/boxplot.html.

The code below demonstrates how one can limit the graphical range

and still get an appropriate box and whisker presentation.

/* construct data which have "outliers" */

data test;

do class=1 to 5;

do i=1 to 75;

x = rannor(1234579);

y = exp(x);

output;

end;

end;

run;

title "Unrestricted range boxplots with outliers";

title2 "BOXPLOT procedure";

proc boxplot data=test;

plot y*class / boxstyle=schematic;

run;

title2 "BOXPLOT macro";

%boxplot(data=test,

class=class,

var=y,

yorder=0 to 20 by 5)

title "Restricted graphical range boxplots: remove outliers above 10";

%boxplot(data=test,

class=class,

var=y,

yorder=0 to 10)

title "Restricted graphical range boxplots: remove outliers above 6";

%boxplot(data=test,

class=class,

var=y,

yorder=0 to 6)

title "Restricted value range boxplots: remove values above 6";

title2 "BOXPLOT procedure";

proc boxplot data=test(where=(y<=6));

plot y*class / boxstyle=schematic;

run;

data test2;

set test(where=(y<=6));

run;

title2 "BOXPLOT macro";

%boxplot(data=test2,

class=class,

var=y,

yorder=0 to 6)

You will observe that as we restrict the graphical range for the

boxplots using the BOXPLOT macro, we do not change the presentation

of any of the statistics. We only eliminate outliers from the

graphical presentation. However, when we restrict the values of

the response variable, we do change the presentation of all

statistics.

Some features of your boxplots (connected mean values) are not

available with the BOXPLOT macro. You can request that the

median values be connected. I am sure that it would not be

difficult to reconstruct the code so as to connect the means.

If you need to do that, then you'll have to work on that yourself.

HTH,

Dale

--- Dam < XXXX@XXXXX.COM > wrote:

=====

---------------------------------------

Dale McLerran

Fred Hutchinson Cancer Research Center

mailto: XXXX@XXXXX.COM

Ph: (206) 667-2926

Fax: (206) 667-5977

---------------------------------------

__________________________________

Do you Yahoo!?

Friends. Fun. Try the all-new Yahoo! Messenger.

http://messenger.yahoo.com/

Similar Threads

1. best way to remove outliers

2. Outliers and Proc univariate

I would like cap extreme observations through proc univariate as follows: proc univariate data=c.CashNonCash01PaySample noprint; weight wt; var Transc_currentbalanceNum ; output out=c.BalOutlier std=balstd pctlpts=99 pctlpres=bal; run; data c.CashNonCash01PaySamplea; set c.CashNonCash01PaySample; if (_n_ eq 1) then set c.BalOutlier(keep=balstd bal99); if balstd>2*bal99 then Transc_currentbalanceNum2=min(Transc_currentbalanceNum, (4*bal99)); else Transc_currentbalanceNum2=Transc_currentbalanceNum; run; Now for numeric variables with only positive values, the above code works ok. I would like to do the same for numeric variables with both positive and negative values. How can this be accomplished within proc univariate? Thanks, Doyle.

3. Q: Outliers in regression analysis : Big problem?

4. Missing Values and Outliers

Missing Values issue: So far, there are no guidelines for how much missing data can be tolerated for a sample of any given size (Tabachnick & Fidell, 1996). Is it correct to analyse the potentially meaningful influences of missing values only when the amount of missings exceeded 7%??? I have seen the scales (item by item) in my study and the % of missings not exceeds 1,3 %. Outliers issue: I have notice several techniques to scan for outliers (steam-and-leaf; zscores, etc... which strategy shoulf i use? Step-by-step.

5. How to exclude outliers after importing data

6. Filter outliers in Enterprise Miner

Hi, In Filter node, when I try to specify the upper and lower limits for a user-specified filtering method - the values disappear as soon as I close the Variables-Filter pop-up box! The same happens with the lower and upper limits (variables) in the Input Data node too. I am not able to restrict the values of any variable for analysis. Any ideas? Thanks, Shalini.