Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Stata implementation of difference-in-differences with binary outcomes

 From Steve Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: Stata implementation of difference-in-differences with binary outcomes Date Sun, 18 Apr 2010 15:34:24 -0400

```"The corresponding DiD-estimate is t=(b-a) - (d-c), which is also
always bounded between "0" and "1""

The range for a DiD is [-2, +2]. Consider, b=.7, a=.1, d=.2, c=.8.
Then t=1.2. The boundaries are reached for a= 0, b= 1, c=1, d=0, and
for a=1, b=0, c=0, d =1.

Steve

On Sun, Apr 18, 2010 at 2:51 PM, Nils Braakmann
> Just to add one point: Using  a linear probability model is relatively
> innocuous in a DiD-setting as the model is saturated (and consequently
> non-parametric) in its main part. To elaborate on that: The main issue
> with the linear probability model ist that it is linear and unbounded
> while the data generating process is non-linear and the outcome is
> bounded between "0" and "1". The linearity in the LPM may leed to
> predictions that are outside of the [0,1]-range. This basically only
> bites when looking at continuous covariates, where the linearity (or
> any other functional form) assumption matters. With dummies (and their
> interactions) you are always looking at mean differences, hence no
> over- or underprediction.
>
> Now, with DiD you are essentially comparing four means, which are all
> bounded between 0 and 1. To make this more concrete, say, you're
> interested in modeling employment shares. Employment in the treatment
> group before treatment (1st period) is "a" (some number between 0 and
> 1), after treatment it is "b". Similarly, in the control group, the
> 1st period outcome is "c" and the 2nd period outcome is "d". Note that
> all outcomes are bounded somewhere between "0" and "1" The
> corresponding DiD-estimate is t=(b-a) - (d-c), which is also always
> bounded between "0" and "1". Now write this as regression using D as
> the treatment group indicator and T as the second period indicator
> (both dummies):
> y=c + (a-c)*D + (d-c)*T + [(b-a) - (d-c)]*(D*T) + error=g_0 +
> g_1*D + g_2*T + g_3*(T*D) + error
>
> The expected values for the both groups and the two periods are:
>
> E[y|D=0,T=0]=g_0=c
> E[y|D=0,T=1]=g_0 + g_2=c + (d-c)=d
> E[y|D=1,T=0]=g_0 + g_1=c + (a-c)=a
> E[y|D=1,T=1]=g_0 + g_1 + g_2 + g_3=a + (a-c) + (d-c) + [(b-a) - (d-c)]=b
>
> These are all always bounded between "0" and "1". The case becomes
> more complicated if you add continuous covariates, though, as these
> may lead to overpredictions. From my experience, however, this usually
> does not matter much in practice.
>
> Best,
> Nils
>
>
> On Fri, Apr 16, 2010 at 5:14 PM, Steve Samuels <sjsamuels@gmail.com> wrote:
>> I agree with Maarten.
>>
>> "(possibly with the -vce(robust)- "
>> I'd say "necessarily, " or just use the "robust" option, in order to
>> assure correct standard errors and tests.
>>
>> In my experience, a the estimated DiDs and their CIs do not have
>> boundary problems: the possible range is -2 to +2, with the average
>> usually close to the middle.
>>
>> Steve
>>
>> On Fri, Apr 16, 2010 at 10:57 AM, Maarten buis <maartenbuis@yahoo.co.uk> wrote:
>>> --- On Fri, 16/4/10, C Engelbrecht wrote:
>>>> But what if the outcome variable is binary? How should I
>>>> model the difference of two latent variables, as is the
>>>> case in Probit / Logit? The usual DID is based on
>>>> differencing Y across these groups, but what should we
>>>> do now that we only have a latent Y*?
>>>
>>> Difference in difference is all about getting at a causal
>>> effect, which is usually difined as a difference in
>>> averages. This also exists and is meaningful when the
>>> dependent variable is binary, that is the risk difference.
>>> You can calculate it using a linear probability model,
>>> which is just a fancy name of using -regress- on a binary
>>> variable (possibly with the -vce(robust)- option.
>>>
>>> There is often some uneasyness in specifying "the effect"
>>> as linear in the probability metric, as that can
>>> eventually lead to predictions outside the range [0, 1].
>>> However, if you define the effect interms of odds ratios
>>> or probit coefficients, you won't get the causal effects
>>> either, see for example: Mood 2010, Allison 1999, or
>>> Neuhaus and Jewell 1993.
>>>
>>> So my guess would be that the linear probability model
>>> is in this case the lesser of two evils.
>>>
>>> Hope this helps,
>>> Maarten
>>>
>>> Allison, Paul D. 1999. "Comparing Logit and Probit
>>> Coefficients Across Groups." Sociological Methods &
>>> Research 28:186–208.
>>>
>>> Mood, Carina. 2010. "Logistic regression: Why we cannot
>>> do what we think we can do, and what we can do about
>>> it." European Sociological Review 26:67–82.
>>>
>>> Neuhaus, John M. and Nicholas P. Jewell. 1993. "A
>>> Geometric Approach to Assess Bias Due to Omited
>>> Covariates in Generalized Linear Models." Biometrika
>>> 80:807–815.
>>>
>>> --------------------------
>>> Maarten L. Buis
>>> Institut fuer Soziologie
>>> Universitaet Tuebingen
>>> Wilhelmstrasse 36
>>> 72074 Tuebingen
>>> Germany
>>>
>>> http://www.maartenbuis.nl
>>> --------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>>
>>
>> --
>> Steven Samuels
>> sjsamuels@gmail.com
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> Voice: 845-246-0774
>> Fax:    206-202-4783
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

--
Steven Samuels
sjsamuels@gmail.com
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax: 206-202-4783

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```