Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Steve Samuels <sjsamuels@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Stata implementation of difference-in-differences with binary outcomes |
Date | Sun, 18 Apr 2010 15:34:24 -0400 |
"The corresponding DiD-estimate is t=(b-a) - (d-c), which is also always bounded between "0" and "1"" The range for a DiD is [-2, +2]. Consider, b=.7, a=.1, d=.2, c=.8. Then t=1.2. The boundaries are reached for a= 0, b= 1, c=1, d=0, and for a=1, b=0, c=0, d =1. Steve On Sun, Apr 18, 2010 at 2:51 PM, Nils Braakmann <nilsbraakmann@googlemail.com> wrote: > Just to add one point: Using a linear probability model is relatively > innocuous in a DiD-setting as the model is saturated (and consequently > non-parametric) in its main part. To elaborate on that: The main issue > with the linear probability model ist that it is linear and unbounded > while the data generating process is non-linear and the outcome is > bounded between "0" and "1". The linearity in the LPM may leed to > predictions that are outside of the [0,1]-range. This basically only > bites when looking at continuous covariates, where the linearity (or > any other functional form) assumption matters. With dummies (and their > interactions) you are always looking at mean differences, hence no > over- or underprediction. > > Now, with DiD you are essentially comparing four means, which are all > bounded between 0 and 1. To make this more concrete, say, you're > interested in modeling employment shares. Employment in the treatment > group before treatment (1st period) is "a" (some number between 0 and > 1), after treatment it is "b". Similarly, in the control group, the > 1st period outcome is "c" and the 2nd period outcome is "d". Note that > all outcomes are bounded somewhere between "0" and "1" The > corresponding DiD-estimate is t=(b-a) - (d-c), which is also always > bounded between "0" and "1". Now write this as regression using D as > the treatment group indicator and T as the second period indicator > (both dummies): > y=c + (a-c)*D + (d-c)*T + [(b-a) - (d-c)]*(D*T) + error=g_0 + > g_1*D + g_2*T + g_3*(T*D) + error > > The expected values for the both groups and the two periods are: > > E[y|D=0,T=0]=g_0=c > E[y|D=0,T=1]=g_0 + g_2=c + (d-c)=d > E[y|D=1,T=0]=g_0 + g_1=c + (a-c)=a > E[y|D=1,T=1]=g_0 + g_1 + g_2 + g_3=a + (a-c) + (d-c) + [(b-a) - (d-c)]=b > > These are all always bounded between "0" and "1". The case becomes > more complicated if you add continuous covariates, though, as these > may lead to overpredictions. From my experience, however, this usually > does not matter much in practice. > > Best, > Nils > > > On Fri, Apr 16, 2010 at 5:14 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >> I agree with Maarten. >> >> "(possibly with the -vce(robust)- " >> I'd say "necessarily, " or just use the "robust" option, in order to >> assure correct standard errors and tests. >> >> In my experience, a the estimated DiDs and their CIs do not have >> boundary problems: the possible range is -2 to +2, with the average >> usually close to the middle. >> >> Steve >> >> On Fri, Apr 16, 2010 at 10:57 AM, Maarten buis <maartenbuis@yahoo.co.uk> wrote: >>> --- On Fri, 16/4/10, C Engelbrecht wrote: >>>> But what if the outcome variable is binary? How should I >>>> model the difference of two latent variables, as is the >>>> case in Probit / Logit? The usual DID is based on >>>> differencing Y across these groups, but what should we >>>> do now that we only have a latent Y*? >>> >>> Difference in difference is all about getting at a causal >>> effect, which is usually difined as a difference in >>> averages. This also exists and is meaningful when the >>> dependent variable is binary, that is the risk difference. >>> You can calculate it using a linear probability model, >>> which is just a fancy name of using -regress- on a binary >>> variable (possibly with the -vce(robust)- option. >>> >>> There is often some uneasyness in specifying "the effect" >>> as linear in the probability metric, as that can >>> eventually lead to predictions outside the range [0, 1]. >>> However, if you define the effect interms of odds ratios >>> or probit coefficients, you won't get the causal effects >>> either, see for example: Mood 2010, Allison 1999, or >>> Neuhaus and Jewell 1993. >>> >>> So my guess would be that the linear probability model >>> is in this case the lesser of two evils. >>> >>> Hope this helps, >>> Maarten >>> >>> Allison, Paul D. 1999. "Comparing Logit and Probit >>> Coefficients Across Groups." Sociological Methods & >>> Research 28:186–208. >>> >>> Mood, Carina. 2010. "Logistic regression: Why we cannot >>> do what we think we can do, and what we can do about >>> it." European Sociological Review 26:67–82. >>> >>> Neuhaus, John M. and Nicholas P. Jewell. 1993. "A >>> Geometric Approach to Assess Bias Due to Omited >>> Covariates in Generalized Linear Models." Biometrika >>> 80:807–815. >>> >>> -------------------------- >>> Maarten L. Buis >>> Institut fuer Soziologie >>> Universitaet Tuebingen >>> Wilhelmstrasse 36 >>> 72074 Tuebingen >>> Germany >>> >>> http://www.maartenbuis.nl >>> -------------------------- >>> >>> >>> >>> >>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> >> >> >> -- >> Steven Samuels >> sjsamuels@gmail.com >> 18 Cantine's Island >> Saugerties NY 12477 >> USA >> Voice: 845-246-0774 >> Fax: 206-202-4783 >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/