Hi SAS folk!

Statistics is nice thing, but I have problems with it if I have

large datasets. It can be discused what is large, but that is another

story.

I noticed that usually more or less everthing is significant with large

datasets. However I can not be sure if significant differences are also

important for real life.

Dealing with this issue for quite some time I noticed that residuals from

orinary linear models are distributed more or less normally e.g. the fit

of observed and expected is OK, but EDF test say that this is not the case.

I found another test by Jarque-Bera which performs better. I have provided

code for macro bellow.

Has anyone any comments on my "problems". I am wondering if there are any

test for other distributions i.e. lognormal, gamma, weibull,..... And I also

wonder if there is any general method/test in models which helps us to

determine what is significant due to large number of data or really significant.

With regards, Gregor

* %normal_Jarque-Bera.sas

*---------------------------------------------------------------------------

* $Id: %normal_Jarque-Bera.sas,v 1.4 2004/03/23 08:02:47 gregor Exp $

*---------------------------------------------------------------------------

* What: Macro for Jarque-Bera (S-K) test for normality. This test accounts

* for large number of observations and in such situation performs better

* than Kolmogorov-Smirnow test. Te later one tends to reject null

* when N is large hypothesis.

*--------------------------------------------------------------------------;

%put . * Jarque_Bera(data, var);

%macro Jarque_Bera(data, var);

%* --- Compute moments and other basic stat. ---;

proc univariate data=&data noprint;

var &var;

output out=tmp_jarque_bera nobs=nobs mean=mean var=var std=std

stdmean=stdmean min=min max=max range=range skewness=skewness

kurtosis=kurtosis mode=mode median=median;

run;

%* --- Test ---;

data tmp_jarque_bera; set tmp_jarque_bera;

chi_data=(((skewness*skewness)/6)+((kurtosis*kurtosis)/24))*nobs;

p=1-probchi(chi_data,2);

run;

%* --- Printout and cleanup ---;

proc sql;

title " --- Jarque-Bera test for normality for variable &var (&data) ---";

select nobs, mean, std, min, max, skewness, kurtosis, chi_data, p

from tmp_jarque_bera;

drop table tmp_jarque_bera;

quit;

%mend;

*--------------------------------------------------------------------------;

* %normal_Jarque-Bera.sas ends here ;