### sas >> how to test difference in difference?

Hi all,

I have the following dataset, where y is the dependent var, X1 is indepent var (continuous), X2 is dummy for group A (=1) and group B (=0)

Y X1 X2

I would like to conduct the following test with standard error reported:

{(aveage y of group A if X1 increases by 0.1 - average y of group A if X1 increases by 0.5) - (average y of group B if X1 increases by 0.1 - average y of group B if X1 increases by 0.5)},

How should I do this?

Chunling

---------------------------------
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes

### sas >> how to test difference in difference?

Chunling,

Assuming that you are modeling the response with both main effects
X1 and X2 and the interaction X1*X2 included in the model, then
you could run

proc glm data=mydata;
class x2;
model y = X1|X2 / solution;
estimate "A increase by 0.1 (1)"
x1 0.1
x2 0 1
x2*x1 0 0.1;
estimate "A increase by 0.5 (2)"
x1 0.5
x2 0 1
x2*x1 0 0.5;
estimate "Diff in A (D1 = 1-2)"
x1 -0.4
x2*x1 0 -0.4;
estimate "B increase by 0.1 (3)"
x1 0.1
x2 1 0
x2*x1 0.1 0;
estimate "B increase by 0.5 (4)"
x1 0.5
x2 1 0
x2*x1 0.5 0;
estimate "Diff in B (D2 = 3-4)"
x1 -0.4
x2*x1 -0.4 0;
estimate "Diff of diff (D1 - D2)"
x2*x1 0.4 -0.4;
run;

Note how I have built up to the final difference of differences.
First, I have constructed the estimate for the response mean
in group A for a change in X1 of 0.1 as well as a change in X1 of
0.5. Next, the estimate of the response mean in group A given
a change of 0.1 minus the estimate of the response mean in group
A given a change of 0.5 is obtained employing an estimate
statement in which the terms of the second estimate statement
are subtracted from the terms of the first estimate statement.
Similarly, one can construct the estimate of the response mean
in group B for a change in X1 of 0.1 as well as a change in
X1 of 0.5 and subsequently compute the difference in group B
for a change of 0.1 vs a change of 0.5. To get the final
solution, we simply construct an estimate statement in which
the coefficients of the difference in B are subtracted from
the coefficients of the difference in A.

The estimate statement will automatically compute both point
estimate and standard error.

HTH,

Dale

=====
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: XXXX@XXXXX.COM
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------

__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus

```I'm looking for someone who can help me review the implementation of
the " difference in difference", a commonly used economics approach.
Essentially you coding two variables time and income together and
comparing them in a linear regression e.g. the change in the estimate
between time 1 and time 2 for the poor income group compared to the
rich group.

I've developed a scheme and would like to have it reviewed.

Any assistance will be appreciated.
Mah-J

```

```Dear SAS L'ers,

I have been asked to implement the "difference-in-difference"
approach -following up on an old post last September. I received some very
helpful suggestions- but still don't have a concrete way to implement this.
I developed a way to directly code the coefficients in a data step-but its
still very time consuming since I have multiple comparisons and multiple
outcomes.

To recap: In its simplest form the DD method provides an assessment of the
difference between the changes in the means for a treatment and comparison
group over a fixed time period.  If the mean value M of some outcome measure
for eligible children is indicated by subscript i and the value for children
not income eligible is indicated by subscript j, then we have
DD = (M post implementation i M pre implementation i) (M post
implementation j M pre implementation j).

Since I have survey data- I would need to do this with the SURVEYREG
procedure- (this ones for you David :)- some of us pay attention at least
some of the time). I also need a regression procedure since I need to do
this controlling for other covariates.

The only procedure- that I've come across that can directly give me at least
the terms in parenthesis- is the LSMEANS/PDIFF in the proc mixed procedure.
If I have at least the results of the terms in parenthesis- I can export it
to excel- and do the final difference using sqrt of sums of squares errors
to generate the standard errors for the DD.  But, I can't figure out how to
do this in surveyreg.

The key program parts in mixed:
CLASS    nchscat sudyr ;
MODEL shealth = nchscat sudyr nchscat*sudyr;
LSMEANS nchscat*sudyr /PDIFF;

Differences of Least Squares Means

Standard
Effect           nchscat     sudyr    _nchscat    _sudyr    Estimate
Error      DF    t Value    >r > |t|

nchscat*sudyr    1<0-<200    2        1<0-<200    3         -0.00106
0.002630    79E3      -0.40      0.6877
nchscat*sudyr    300+        2        300+        3         0.003644
0.002177    79E3       1.67      0.0941

this approach in surveyreg- when I have 4 time periods and 4 income groups-
and need to control for other factors.

Outcome: Health
Time: T1, T2, T3, T4
Income: INC1, INC2, INC3, INC4

I want to be able to make multiple comparisons over time and across income
groups- so:
INC1 compared to INC2
INC1 compared to INC3
INC1 compared to INC4

for T1 compared to T2, T3 compared to T2, T4 compared to T3, and T4 compared
to T2

So the D&D for just one of these comparisons would be-comparing INC1 and
INC4 for T1 and T2
DD=(INC1_T2 -INC1_T1)-(INC4_T2 -INC4_T1)

Your time and expertise is very much appreciated.
Mah-J
```

```Hello! I conduct a survey in which I have a very large (approximately
800,000 records) set of numerical data (the data is financial data
collected from various businesses). All of the data is run through a
current-year/prior year data check (i.e., if the current year data is
out of tolerance from what was reported in the prior year then the
data is said to have "failed" the CY/PY edit check). I also keep a
record of which data amounts are changed from what was originally
reported by the businesses. I wanted to conduct a statistical test to
confirm that the proportion of data changed in amounts that "passed"
the CY/PY edit check is significantly different from the proportion of
data changed in amounts that "failed" the CY/PY edit check. What kind
of a statistical test should I use... would it just be your very basic
two-sample proportion test or should I use something else?

What (I think) complicates things is that the proportions here are
extremely low (i.e., less than 1 in 500 amounts end up getting
changed, both within amounts that "pass" and amounts that "fail" the
CY/PY edit check). Would the low proportion value effect the accuracy
of the test? Also, I have doubts this data comes from a normal
distribution.... what test would be best if the data is normal and
what test would be best if the data is not normal? Is it possible to
do any of these proportions tests in SAS or would I have to do them
manually?

I'm really just looking for some names of tests/proc procedures that
would be appropriate to use given the above and I can take it from
there, thanks!

Julie

```