### sas >> LSMEANS/ Difference in Difference Approach

Dear SAS L'ers,

I have been asked to implement the "difference-in-difference"
approach -following up on an old post last September. I received some very
helpful suggestions- but still don't have a concrete way to implement this.
I developed a way to directly code the coefficients in a data step-but its
still very time consuming since I have multiple comparisons and multiple
outcomes.

To recap: In its simplest form the DD method provides an assessment of the
difference between the changes in the means for a treatment and comparison
group over a fixed time period. If the mean value M of some outcome measure
for eligible children is indicated by subscript i and the value for children
not income eligible is indicated by subscript j, then we have
DD = (M post implementation i M pre implementation i) (M post
implementation j M pre implementation j).

Since I have survey data- I would need to do this with the SURVEYREG
procedure- (this ones for you David :)- some of us pay attention at least
some of the time). I also need a regression procedure since I need to do
this controlling for other covariates.

The only procedure- that I've come across that can directly give me at least
the terms in parenthesis- is the LSMEANS/PDIFF in the proc mixed procedure.
If I have at least the results of the terms in parenthesis- I can export it
to excel- and do the final difference using sqrt of sums of squares errors
to generate the standard errors for the DD. But, I can't figure out how to
do this in surveyreg.

The key program parts in mixed:
CLASS nchscat sudyr ;
MODEL shealth = nchscat sudyr nchscat*sudyr;
LSMEANS nchscat*sudyr /PDIFF;

Differences of Least Squares Means

Standard
Effect nchscat sudyr _nchscat _sudyr Estimate
Error DF t Value >r > |t|

nchscat*sudyr 1<0-<200 2 1<0-<200 3 -0.00106
0.002630 79E3 -0.40 0.6877
nchscat*sudyr 300+ 2 300+ 3 0.003644
0.002177 79E3 1.67 0.0941

this approach in surveyreg- when I have 4 time periods and 4 income groups-
and need to control for other factors.

Outcome: Health
Time: T1, T2, T3, T4
Income: INC1, INC2, INC3, INC4

I want to be able to make multiple comparisons over time and across income
groups- so:
INC1 compared to INC2
INC1 compared to INC3
INC1 compared to INC4

for T1 compared to T2, T3 compared to T2, T4 compared to T3, and T4 compared
to T2

So the D&D for just one of these comparisons would be-comparing INC1 and
INC4 for T1 and T2
DD=(INC1_T2 -INC1_T1)-(INC4_T2 -INC4_T1)

Your time and expertise is very much appreciated.
Mah-J

### sas >> LSMEANS/ Difference in Difference Approach

XXXX@XXXXX.COM wrote:
>implementation j M pre implementation j)>
>Since I have survey data- I would need to do this with the SURVEYRE>
>procedure- (this ones for you David :)- some of us pay attention at leas>
>some of the time). I also need a regression procedure since I need to d>
>this controlling for other covariates>
>The only procedure- that I've come across that can directly give me a>
>leas>
>the terms in parenthesis- is the LSMEANS/PDIFF in the proc mixed procedure>
>If I have at least the results of the terms in parenthesis- I can export i>
>to excel- and do the final difference using sqrt of sums of squares error>
>to generate the standard errors for the DD. But, I can't figure out how t>
>do this in surveyreg>
>The key program parts in mixed>
> CLASS nchscat sudyr >
> MODEL shealth = nchscat sudyr nchscat*sudyr>
> LSMEANS nchscat*sudyr /PDIFF>
> Differences of Least Square>
>Mean>
>Standar>
> Effect nchscat sudyr _nchscat _sudyr Estimat>
>Error DF t Value >r > |t>
> nchscat*sudyr 1<0-<200 2 1<0-<200 3 -0.0010>
>0.002630 79E3 -0.40 0.687>
> nchscat*sudyr 300+ 2 300+ 3 0.00364>
>0.002177 79E3 1.67 0.094>
>this approach in surveyreg- when I have 4 time periods and 4 income groups>
>and need to control for other factors>
>Outcome: Healt>
>Time: T1, T2, T3, T>
>Income: INC1, INC2, INC3, INC>
>I want to be able to make multiple comparisons over time and across incom>
>groups- so>
>INC1 compared to INC>
>INC1 compared to INC>
>INC1 compared to INC>
>for T1 compared to T2, T3 compared to T2, T4 compared to T3, and T>
>compare>
>to T>
>So the D&D for just one of these comparisons would be-comparing INC1 an>
>INC4 for T1 and T>
>DD=(INC1_T2 -INC1_T1)-(INC4_T2 -INC4_T1>
>Your time and expertise is very much appreciated>
>Mah-J

Let me dump a bunch of random thoughts on you, and perhaps some of

[1] The "difference-in-difference" approach that has been hurled at your
head might be useful if you had two times, but with so many times, I worry
about the ability of this to handle the covariance structure.

[2] On the same note, PROC SURVEYREG cannot handle longitudinal data
like PROC MIXED can. I would rather use survey sample analysis in a case
like this, but you may want to consider doing this in PROC MIXED to handle
the year-to-year covariance structure. If the year-to-year autocorrelation
is
high, then you may do better to model this in PROC MIXED. Otherwise,
you'll end up sticking in 'year' dummy variables or writing YEAR as a class
variable, and having to ignore the inter-annual correlation structure.

[3] But PROC MIXED does let you handle complex covariance structures,
so you may want to try and use PROC MIXED, *but* define the autocorrelation
structure that you would expect to get from your design. This will require
you to know the sampling weights, and also the design effects. Is there
clustering? Stratification? PPS sampling? What?

[4] You can do the diff-in-diffs in ESTIMATE statements. But you have
interaction terms, and you cannot leav

### sas >> LSMEANS/ Difference in Difference Approach

ear David,

Thank you for responding- or should I say questioning :) Unfortunately- I'm
a consultant so have do respond to my clients needs- so producing the D&D.

Regarding the data- we using the National Health Interview Survey which is a
national survey done on an annual basis- a national random sample of the US
population (with caveats e.g. doesn't include institutionalized population).

Its a multistage area probability design. Simplistically we can consider --
a 'random' selection is done from the primary sampling units (PSUs), then
strata within PSUs are 'randomly' selected, and then households within
strata (stratum). I put random in ' ' because its not truly random- since
e.g. larger PSUs have a higher probability of section, over sampling of low
income strata etc.

So in SUDAAN this would be how the design and weight statements are
considered.

NEST stratum psu;
WEIGHT wtf;

So if these parameters can be accounted for in MIXED- I would really prefer
to do that way- since writing estimate and contrast statements have eluded
me. I can just about figure through multiple interactions- but need to sit
down and write it out.

I was not able to find how to include the survey design effects and weights
in proc mixed- ( is the weight statement in MIXED analytic weights?)

Thank you again,
Mah-J

ps: notice how I pulled off " Happy waiting patiently" this time round :)

-----Original Message-----
From: SAS(r) Discussion [mailto: XXXX@XXXXX.COM ]On Behalf Of
David L Cassell
Sent: Thursday, April 27, 2006 7:27 PM
To: XXXX@XXXXX.COM
Subject: Re: LSMEANS/ Difference in Difference Approach

XXXX@XXXXX.COM wrote:
>implementation j M pre implementation j)>
>Since I have survey data- I would need to do this with the SURVEYRE>
>procedure- (this ones for you David :)- some of us pay attention at leas>
>some of the time). I also need a regression procedure since I need to d>
>this controlling for other covariates>
>The only procedure- that I've come across that can directly give me a>
>leas>
>the terms in parenthesis- is the LSMEANS/PDIFF in the proc mixed procedure>
>If I have at least the results of the terms in parenthesis- I can export i>
>to excel- and do the final difference using sqrt of sums of squares error>
>to generate the standard errors for the DD. But, I can't figure out how t>
>do this in surveyreg>
>The key program parts in mixed>
> CLASS nchscat sudyr >
> MODEL shealth = nchscat sudyr nchscat*sudyr>
> LSMEANS nchscat*sudyr /PDIFF>
> Differences of Least Square>
>Mean>
>Standar>
> Effect nchscat sudyr _nchscat _sudyr Estimat>
>Error DF t Value >r > |t>
> nchscat*sudyr 1<0-<200 2 1<0-<200 3 -0.0010>
>0.002630 79E3 -0.40 0.687>
> nchscat*sudyr 300+ 2 300+ 3 0.00364>
>0.002177 79E3 1.67 0.094>
>this approach in surveyreg- when I have 4 time periods and 4 income groups>
>and need to control for other factors>
>Outcome: Healt>
>Time: T1, T2, T3, T>
>Income: INC1, INC2, INC3, INC>
>I want to be able to make multiple comparisons over time and across incom>
>groups- so>
>INC1 compared to INC>

```Hello,

I have baseline and follow-up values for my intervention and control group.
I want to conduct a difference-in-difference analysis. Y is continuous. i.e.

Y = b0 + b1*Time + b2*Intervention + b3*(Intervention*Time) +b4Xn

where Time is an indicator if the value is obtained at follow-up (1, 0)
Intervention is an indicator for being in the intervention group (1,0)
Intervention*TIme is the interaction
Xn: vector of all my other covariates.

I understand the theory pretty well, but have not been able to find the SAS
code to conduct this analysis.  Pl. advise on 2 things:
1. How to arrange the data i.e. one line person or one line per person per
time period?
2. The code of course. Do you use PROC GENMOD or just PROC REG?

I just heard about the LISTSERV, and the resources available, so I'm new to
this forum.

Thanks in advance for taking the time

-- MG
```

```Hello everyone,

>i have fitted a piecewise linear model using GEE for repeated measurements
>with binary outcome - following is the code.
>
>Time takes 1, 2, ..., 10,11
>and after time=6 the intervention started, so
>st1=min(time,6) and
>st2=max(0,time).
>
>I am trying to test whether coefficients for st2*interv is significantly
>different from st1*interv (whether rate of change is different by
>intervention over time - the difference of difference).  so i wrote the
>following program but can't get the estimate from SAS Genmod.

proc genmod data=ppnewnw.co_&pr order=data;
class decid time interv;

model co_&pr._n/co_&pr._d= st1|interv  st2|interv     /
dist=binomial

dscale
lrci;
estimate "st1*interv vs. st2*interv"
st1*interv 1 0
st2*interv -1 0/e;
repeated subject=decid / type=ar(1) corrw covb;
title "&pr. - Binomial Piecewise Linear GEE Model for Balanced Sample/&vr";
run;;

I then recoded st1 as 1, 2, 3,4,5,6,7,8,9,10,11 and st2 as 0, 0, 0, 0, 0,
0, 7, 8, 9, 10 , 11.  But both st1*trt and st2*trt are not significant this
time - my concern is that st1 and st2 are highly correlated (0.9) so that
it covers each other's effects.

I wonder if any one can offer some insight as how I may be able to do the test.

Thank you very much.

I was kindly informed that by attaching an html file, i may have caused
trouble to many people.  I was not aware of it before so please accept my
apology here.

elena
```

```Hi,
I want to calculate the difference between diag date and each date of
service for each pt. following is the sample of  data of 3 pt :

pt   diag_dt               sev_dt          dif
1    10APR2006      05JUN2006     56
1    10APR2006      06JUN2006     57.425694346
1    10APR2006      07JUN2006     58
2    19AUG2005     19JUN2007      669.7414341
2    19AUG2005     27JUN2007      677.73304367
3    16OCT2006     13NOV2006      28
3    16OCT2006      20NOV2006     35
3    16OCT2006     27NOV2006      42

I calculated the difference between two dates and got the dif values
but I am not sure why few values are having decimals and not just the
number.
following is the code I used:

dif= (diag_dt-sev_dt);

Please advise if I have to use any function to calculate the
difference where I can get values with out decimals.

thank you,
Gopi.
```

```Hi,
I have a variable called vismin which has values from 1 to 5
indicating different visible minority categories. I want to carry out
a test of the null hypothesis that the coefficients of my explanatory
variables are the same for different regressions (i.e.
B1=B2=B3=B4=B5). I don't know if I can use my variable vismin or if I
should create 5 difference dummy variables and use that?

I thought I could use a proc logistic, but it seems we an only use
that for one regression, not when comparing the 5 regressions.

Any thoughts?
Thanks.
```