Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Stata implementation of difference-in-differences with binary outcomes

 From Nils Braakmann To statalist@hsphsun2.harvard.edu Subject Re: st: Stata implementation of difference-in-differences with binary outcomes Date Sun, 18 Apr 2010 21:41:15 +0200

```You're obviously right (sorry, sunday evening and econometrics don't
go so well together...). However, the predicted outcomes are bounded
between "0" and "1", which is the main issue with LPM.
Nils

On Sun, Apr 18, 2010 at 9:34 PM, Steve Samuels <sjsamuels@gmail.com> wrote:
> "The corresponding DiD-estimate is t=(b-a) - (d-c), which is also
> always bounded between "0" and "1""
>
> The range for a DiD is [-2, +2]. Consider, b=.7, a=.1, d=.2, c=.8.
> Then t=1.2. The boundaries are reached for a= 0, b= 1, c=1, d=0, and
> for a=1, b=0, c=0, d =1.
>
> Steve
>
> On Sun, Apr 18, 2010 at 2:51 PM, Nils Braakmann
>> Just to add one point: Using  a linear probability model is relatively
>> innocuous in a DiD-setting as the model is saturated (and consequently
>> non-parametric) in its main part. To elaborate on that: The main issue
>> with the linear probability model ist that it is linear and unbounded
>> while the data generating process is non-linear and the outcome is
>> bounded between "0" and "1". The linearity in the LPM may leed to
>> predictions that are outside of the [0,1]-range. This basically only
>> bites when looking at continuous covariates, where the linearity (or
>> any other functional form) assumption matters. With dummies (and their
>> interactions) you are always looking at mean differences, hence no
>> over- or underprediction.
>>
>> Now, with DiD you are essentially comparing four means, which are all
>> bounded between 0 and 1. To make this more concrete, say, you're
>> interested in modeling employment shares. Employment in the treatment
>> group before treatment (1st period) is "a" (some number between 0 and
>> 1), after treatment it is "b". Similarly, in the control group, the
>> 1st period outcome is "c" and the 2nd period outcome is "d". Note that
>> all outcomes are bounded somewhere between "0" and "1" The
>> corresponding DiD-estimate is t=(b-a) - (d-c), which is also always
>> bounded between "0" and "1". Now write this as regression using D as
>> the treatment group indicator and T as the second period indicator
>> (both dummies):
>> y=c + (a-c)*D + (d-c)*T + [(b-a) - (d-c)]*(D*T) + error=g_0 +
>> g_1*D + g_2*T + g_3*(T*D) + error
>>
>> The expected values for the both groups and the two periods are:
>>
>> E[y|D=0,T=0]=g_0=c
>> E[y|D=0,T=1]=g_0 + g_2=c + (d-c)=d
>> E[y|D=1,T=0]=g_0 + g_1=c + (a-c)=a
>> E[y|D=1,T=1]=g_0 + g_1 + g_2 + g_3=a + (a-c) + (d-c) + [(b-a) - (d-c)]=b
>>
>> These are all always bounded between "0" and "1". The case becomes
>> more complicated if you add continuous covariates, though, as these
>> may lead to overpredictions. From my experience, however, this usually
>> does not matter much in practice.
>>
>> Best,
>> Nils
>>
>>
>> On Fri, Apr 16, 2010 at 5:14 PM, Steve Samuels <sjsamuels@gmail.com> wrote:
>>> I agree with Maarten.
>>>
>>> "(possibly with the -vce(robust)- "
>>> I'd say "necessarily, " or just use the "robust" option, in order to
>>> assure correct standard errors and tests.
>>>
>>> In my experience, a the estimated DiDs and their CIs do not have
>>> boundary problems: the possible range is -2 to +2, with the average
>>> usually close to the middle.
>>>
>>> Steve
>>>
>>> On Fri, Apr 16, 2010 at 10:57 AM, Maarten buis <maartenbuis@yahoo.co.uk> wrote:
>>>> --- On Fri, 16/4/10, C Engelbrecht wrote:
>>>>> But what if the outcome variable is binary? How should I
>>>>> model the difference of two latent variables, as is the
>>>>> case in Probit / Logit? The usual DID is based on
>>>>> differencing Y across these groups, but what should we
>>>>> do now that we only have a latent Y*?
>>>>
>>>> Difference in difference is all about getting at a causal
>>>> effect, which is usually difined as a difference in
>>>> averages. This also exists and is meaningful when the
>>>> dependent variable is binary, that is the risk difference.
>>>> You can calculate it using a linear probability model,
>>>> which is just a fancy name of using -regress- on a binary
>>>> variable (possibly with the -vce(robust)- option.
>>>>
>>>> There is often some uneasyness in specifying "the effect"
>>>> as linear in the probability metric, as that can
>>>> eventually lead to predictions outside the range [0, 1].
>>>> However, if you define the effect interms of odds ratios
>>>> or probit coefficients, you won't get the causal effects
>>>> either, see for example: Mood 2010, Allison 1999, or
>>>> Neuhaus and Jewell 1993.
>>>>
>>>> So my guess would be that the linear probability model
>>>> is in this case the lesser of two evils.
>>>>
>>>> Hope this helps,
>>>> Maarten
>>>>
>>>> Allison, Paul D. 1999. "Comparing Logit and Probit
>>>> Coefficients Across Groups." Sociological Methods &
>>>> Research 28:186–208.
>>>>
>>>> Mood, Carina. 2010. "Logistic regression: Why we cannot
>>>> do what we think we can do, and what we can do about
>>>> it." European Sociological Review 26:67–82.
>>>>
>>>> Neuhaus, John M. and Nicholas P. Jewell. 1993. "A
>>>> Geometric Approach to Assess Bias Due to Omited
>>>> Covariates in Generalized Linear Models." Biometrika
>>>> 80:807–815.
>>>>
>>>> --------------------------
>>>> Maarten L. Buis
>>>> Institut fuer Soziologie
>>>> Universitaet Tuebingen
>>>> Wilhelmstrasse 36
>>>> 72074 Tuebingen
>>>> Germany
>>>>
>>>> http://www.maartenbuis.nl
>>>> --------------------------
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>>
>>>
>>> --
>>> Steven Samuels
>>> sjsamuels@gmail.com
>>> 18 Cantine's Island
>>> Saugerties NY 12477
>>> USA
>>> Voice: 845-246-0774
>>> Fax:    206-202-4783
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
>
> --
> Steven Samuels
> sjsamuels@gmail.com
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> Voice: 845-246-0774
> Fax: 206-202-4783
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```