Home  /  Products  /  Stata 18  /  Heterogeneous DID

<- See Stata 18's new features

Highlights

  • Estimation

    • Heterogeneity over cohort and time

    • Panel data

    • Repeated cross-sectional data

  • Four estimators

    • Regression adjustment (RA)

    • Inverse-probability weighting (IPW)

    • Augmented inverse-probability weighting (AIPW)

    • Two-way fixed-effects regression (TWFE)

  • Plots of treatment-effects heterogeneity

  • Test of parallel pretreatment trends

  • Aggregation of ATETs over:

    • Cohort

    • Time

    • Exposure to treatment

  • Simultaneous confidence intervals

When average treatment effects vary over time and over cohort, you can now use the new hdidregress and xthdidregress commands to estimate heterogeneous average treatment effects on the treated (ATETs). Use hdidregress with repeated cross-sectional data and xthdidregress with panel data. Choose from one of four estimators, including regression adjustment and inverse-probability weighting. Plot ATETs time profiles for each cohort with estat atetplot. Aggregate the ATETs within cohort, time, and exposure to treatment with estat aggregation. Explore more postestimation features.

Treatment effects measure the causal effect of a treatment on an outcome. A treatment is a new drug regimen, a surgical procedure, a training program, or even an ad campaign intended to affect an outcome such as blood pressure, mobility, employment, or sales. It is of interest to estimate an ATET.

The standard difference-in-differences (DID) estimator, implemented in existing commands didregress and xtdidregress, estimates an ATET that is common to all groups across time. When groups are treated at different points in time, the assumption about a constant ATET may be violated. The new commands implement estimation methods that account for heterogeneity of the ATET and provide cohort-specific and time-specific ATET estimates.

Let's see it work

We would like to know if a school-district-level program, Healthy Habits, reduces students' body mass index (BMI) in the school district. We have fictional data on the Healthy Habits program. This program incorporates more exercise time and augments the intake of fruits and vegetables. Our data are at the school district level and include information on whether a school participates in the program, hhabit, and the BMI of students in the district, bmi. We have repeated samples of students ages 11 to 14 from 40 school districts from 2013 to 2021.

For the outcome model, we believe that the mother's education, medu, is a good predictor of the health habits of children. We also believe that participation in sports, sports, affects bmi. Finally, we control for whether the student is a girl to account for behavioral differences and differences in body types of boys and girls at this age.

For the treatment model, we use the number of parks in the district (parksd) to model hhabit. We conjecture that school districts with more parks consider exercise spaces more important in their urban planning than those with fewer parks. These districts are therefore more amenable to the Healthy Habits program.

We use the aipw estimator to model both the outcome and the treatment. The aipw estimator has a double-robustness property, implying that only one of the outcome model or the treatment model needs to be correctly specified to obtain consistent estimates.

We fit the following model:

. hdidregress aipw (bmi medu i.girl i.sports) (hhabit parksd), group(schools) time(year)
note: variable _did_cohort, containing cohort indicators formed by treatment variable hhabit and
      group variable schools, was added to the dataset.

Computing ATET for each cohort and time:
Cohort 2015 (8): ........ done
Cohort 2017 (8): ........ done
Cohort 2019 (8): ........ done

Treatment and time information

Time variable: year
Time interval: 2013 to 2021
Control:       _did_cohort = 0
Treatment:     _did_cohort > 0
_did_cohort
Number of cohorts 4
Number of obs
Never treated 11355
2015 1231
2017 2097
2019 2042
Heterogeneous treatment-effects regression Number of obs = 16,725 Estimator: Augmented IPW Treatment level: schools Control group: Never treated (Std. err. adjusted for 40 clusters in schools)
Robust
Cohort ATET std. err. z P>|z| [95% conf. interval]
2015
year
2014 .6544681 .5946048 1.10 0.271 -.5109359 1.819872
2015 -1.226451 .379168 -3.23 0.001 -1.969607 -.4832957
2016 -2.491842 .4169657 -5.98 0.000 -3.30908 -1.674605
2017 -2.72486 .2363878 -11.53 0.000 -3.188171 -2.261548
2018 -2.786634 .6672867 -4.18 0.000 -4.094492 -1.478776
2019 -3.980456 .2993279 -13.30 0.000 -4.567127 -3.393784
2020 -.604415 .5929199 -1.02 0.308 -1.766517 .5576866
2021 -.6522272 .3640416 -1.79 0.073 -1.365736 .0612812
2017
year
2014 .6635794 .3089663 2.15 0.032 .0580167 1.269142
2015 -1.3933 .3871204 -3.60 0.000 -2.152042 -.6345582
2016 .5947865 .4065947 1.46 0.144 -.2021245 1.391697
2017 -1.71427 .4565384 -3.75 0.000 -2.609069 -.8194714
2018 -3.170542 .5221368 -6.07 0.000 -4.193912 -2.147173
2019 -2.967701 .4247053 -6.99 0.000 -3.800108 -2.135294
2020 .0360098 .6868764 0.05 0.958 -1.310243 1.382263
2021 -.957117 .3510986 -2.73 0.006 -1.645258 -.2689763
2019
year
2014 -1.434451 .5163232 -2.78 0.005 -2.446426 -.422476
2015 1.010288 .4808165 2.10 0.036 .067905 1.952671
2016 -.3809733 .4336764 -0.88 0.380 -1.230963 .4690169
2017 .5199519 .4849723 1.07 0.284 -.4305763 1.47048
2018 -.0315794 .5863875 -0.05 0.957 -1.180878 1.117719
2019 -3.602114 .3498692 -10.30 0.000 -4.287845 -2.916383
2020 -1.388906 .6765493 -2.05 0.040 -2.714919 -.0628943
2021 -.6222491 .5510466 -1.13 0.259 -1.70228 .4577824
Note: ATET computed using covariates.

We specified the outcome model in the first set of parentheses and the treatment model in the second set of parentheses. We also specified option group(schools) to define that treatment occurs at the school level and to identify schools as the clustering variable. Finally, we specified a time variable year in option time().

The note below the command indicates that the categorical variable _did_cohort is generated with cohort information. Units in the same cohort start the treatment at the same time. We see that there are three cohorts in our data: 2015, 2017, and 2019. In addition, we see that 11,355 observations are never treated. The time variable year ranges from 2013 to 2021.

The estimation table reports the ATET for each cohort in each year. For example, for the cohort 2015 in the year 2016, the ATET estimate is –2.5, which implies the Healthy Habits program, on average, reduces BMI by 2.5 for students in a district of the 2015 cohort in 2016 relative to the scenario where the district does not participate. The other estimates can be interpreted similarly.

It is difficult to see the trends in ATETs just by looking at all the ATETs estimates. We can use estat atetplot to visualize the time profile of ATETs for each cohort. We specify option sci to show the simultaneous confidence bands guaranteed to cover the true values of ATETs across all the cohorts and time with a predefined probability level.

. estat atetplot, sci

After fitting the model, we can use estat aggregation to aggregate the ATETs within cohort, time, and exposure to treatment. It provides a summary of different aspects of ATETs. For example, we use estat aggregation, cohort to summarize the ATETs of each cohort within time. We also specify option graph to obtain a graph of aggregations in addition to the tabular output.

. estat aggregation, cohort graph

ATET over cohort                                        Number of obs = 16,725

                               (Std. err. adjusted for 40 clusters in schools)
Robust
Cohort ATET std. err. z P>|z| [95% conf. interval]
2015 -2.065755 .1999412 -10.33 0.000 -2.457633 -1.673877
2017 -1.7781 .4013978 -4.43 0.000 -2.564825 -.9913744
2019 -1.869405 .4650349 -4.02 0.000 -2.780857 -.9579538

If we want to summarize ATETs within time, we specify option time with estat aggregation.

. estat aggregation, time graph

ATET over time                                          Number of obs = 16,725

                               (Std. err. adjusted for 40 clusters in schools)
Robust
Time ATET std. err. z P>|z| [95% conf. interval]
2015 -1.226451 .379168 -3.23 0.001 -1.969607 -.4832957
2016 -2.491842 .4169657 -5.98 0.000 -3.30908 -1.674605
2017 -2.111619 .3654785 -5.78 0.000 -2.827943 -1.395294
2018 -3.028686 .4278557 -7.08 0.000 -3.867268 -2.190104
2019 -3.449829 .2670184 -12.92 0.000 -3.973176 -2.926483
2020 -.6624494 .44865 -1.48 0.140 -1.541787 .2168884
2021 -.7575068 .2816374 -2.69 0.007 -1.309506 -.2055078

Finally, if we want to summarize ATETs over different lengths of exposure to treatment, we specify option dynamic.

. estat aggregation, dynamic graph

Duration of exposure ATET                               Number of obs = 16,725

                               (Std. err. adjusted for 40 clusters in schools)
Robust
Exposure ATET std. err. z P>|z| [95% conf. interval]
-5 -1.434451 .5163232 -2.78 0.005 -2.446426 -.422476
-4 1.010288 .4808165 2.10 0.036 .067905 1.952671
-3 .1338267 .3091619 0.43 0.665 -.4721195 .739773
-2 -.4256324 .4292553 -0.99 0.321 -1.266957 .4156925
-1 .3727141 .3197563 1.17 0.244 -.2539967 .999425
0 -2.285098 .3827362 -5.97 0.000 -3.035248 -1.534949
1 -2.344265 .3829047 -6.12 0.000 -3.094744 -1.593785
2 -2.045521 .3911543 -5.23 0.000 -2.81217 -1.278873
3 -1.045601 .6840119 -1.53 0.126 -2.38624 .2950372
4 -2.145004 .5952525 -3.60 0.000 -3.311678 -.978331
5 -.604415 .5929199 -1.02 0.308 -1.766517 .5576866
6 -.6522272 .3640416 -1.79 0.073 -1.365736 .0612812
Note: Exposure is the number of periods since the first treatment time.

Made for data science.

Get started today.