Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "David M. Drukker" <ddrukker@stata.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: teffects, caliper, propensity score matching |
Date | Tue, 4 Mar 2014 09:25:50 -0600 (CST) |
Scott Cunningham <scunning@gmail.com> posted several questions regarding -teffects psmatch- on Friday, 28 February. We apologize for the delay, one of us has been traveling. Here are the short versions of the questions and the answers. We discuss the details below. Scott's first question was about how to replicate results from -psmatch2- using -teffects-. The answer is to use the -ties- option in -psmatch2-. -psmatch2- drops ties, while -teffects- keeps the ties following the recommendation of Abadie and Imbens (2006). Scott's second question was about how to replicate the results from -psmatch2- using -teffects- with caliper matching. Caliper matching requires that each observation have a match within the specified caliper distance. -psmatch2- automatically drops observations for which no match within the caliper distance can be found. Dropping these observations changes the population parameter. -teffects- refuses to proceed so that you can choose how to identify a feasible parameter. Scott's third question pertains to which treatment level is the base category for the overlap plot. By default, the first treatment level is the base category, as discussed below. We now discuss Scott's questions in detail.
1. My first question is regarding the comparability of teffects psmatch and psmatch2. I have been unable to successfully replicate psmatch2 results using teffects. The two seemingly identical commands yield very different treatment effect estimates. -teffects- gives me an estimate of 730.38, but psmatch2 age me a return of 951. My understanding is that both used logit to estimate propensity score, both used nearest neighbor(1) to find nearest neighbor. So I am at a loss to explain why they are different.
We begin by downloading the data and creating variables that Scott used. . use http://users.nber.org/~rdehejia/data/nsw_dw.dta. . generate double agesq = age*age
. generate double agecubed = age*age*age . generate double edusq = educ*educ . generate byte u74 = (re74==0) . generate byte u75 = (re75==0) . generate double edure74 = educ*re74 . save nsw_dw, replace -teffects psmatch- includes all tied matches. To obtain the same results from -psmatch2- specify the -ties- option. Here is an example ******************* Begin Output******************************************** . psmatch2 treat age agesq agecubed edusq edure74 education married ///
nodegree re74 re75 u74 u75 black hispanic, /// outcome(re78) logit ties neighbor(1) ate
Logistic regression Number of obs = 445 LR chi2(14) = 26.42 Prob > chi2 = 0.0229 Log likelihood = -288.89043 Pseudo R2 = 0.0437 ------------------------------------------------------------------------------ treat | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.4241147 .4098083 -1.03 0.301 -1.227324 .3790947 agesq | .0147783 .0135348 1.09 0.275 -.0117493 .041306 agecubed | -.0001607 .0001423 -1.13 0.259 -.0004397 .0001182 edusq | .0481057 .0238387 2.02 0.044 .0013828 .0948287 edure74 | .0000127 .0000125 1.02 0.308 -.0000117 .0000371 education | -.9525424 .4257927 -2.24 0.025 -1.787081 -.118004 married | .1694692 .2852625 0.59 0.552 -.3896349 .7285734 nodegree | -.4125117 .3923586 -1.05 0.293 -1.18152 .356497 re74 | -.0001822 .0001407 -1.30 0.195 -.0004579 .0000935 re75 | .0000392 .0000505 0.78 0.438 -.0000598 .0001381 u74 | -.216387 .3851458 -0.56 0.574 -.9712588 .5384849 u75 | -.3428689 .3238441 -1.06 0.290 -.9775917 .2918538 black | -.2541281 .3697488 -0.69 0.492 -.9788225 .4705663 hispanic | -.9218369 .5180363 -1.78 0.075 -1.937169 .0934956 _cons | 9.031523 4.742779 1.90 0.057 -.2641534 18.3272 ------------------------------------------------------------------------------ There are observations with identical propensity score values. The sort order of the data could affect your results. Make sure that the sort order is random before calling psmatch2. ------------------------------------------------------------------------------- Variable Sample | Treated Controls Difference S.E. T-stat ----------------------------+----------------------------------------------------------- re78 Unmatched | 6349.1435 4554.80112 1794.34238 632.853392 2.84 ATT | 6349.1435 4291.20612 2057.93739 873.982463 2.35 ATU | 4554.80112 6355.40981 1800.60869 . . ATE | 1907.58803 . . ----------------------------+----------------------------------------------------------- Note: S.E. does not take into account that the propensity score is estimated. | psmatch2: psmatch2: | Common Treatment | support assignment | On suppor | Total -----------+-----------+---------- Untreated | 260 | 260Treated | 185 | 185 -----------+-----------+----------
Total | 445 | 445 ******************* End Output******************************************** Note that the estimated ATE is 1907.58803. Now we replicate the ATE estimate using -teffects-. ******************* Begin Output******************************************** . teffects psmatch (re78) ///
(treat age agesq agecubed edusq edure74 education married /// nodegree re74 re75 u74 u75 black hispanic, logit)
Treatment-effects estimation Number of obs = 445 Estimator : propensity-score matching Matches: requested = 1 Outcome model : matching min = 1 Treatment model: logit max = 8 ------------------------------------------------------------------------------ | AI Robust re78 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- ATE | treat | (1 vs 0) | 1907.588 879.8298 2.17 0.030 183.1534 3632.023 ------------------------------------------------------------------------------ ******************* End Output******************************************** Specifying the -ties- option on -psmatch2- also causes it to produce the same estimate for the average treatment effect on the treated (ATET). (In the command below, we did not specify the -ate- option causing -psmatch2- to estimate the ATET.) ******************* Begin Output********************************************. . * psmatch2 results for the Average treatment effect for the treatment . * group (here, ATT)
. psmatch2 treat age agesq agecubed edusq edure74 education married ///
nodegree re74 re75 u74 u75 black hispanic, /// outcome(re78) logit ties neighbor(1)
[Output Omitted] ******************* End Output******************************************** produces an estimated ATET of 2057.937. ******************* Begin Output******************************************** . teffects psmatch (re78) ///
(treat age agesq agecubed edusq edure74 education married /// nodegree re74 re75 u74 u75 black hispanic, logit), /// atet vce(iid)
Treatment-effects estimation Number of obs = 445 Estimator : propensity-score matching Matches: requested = 1 Outcome model : matching min = 1 Treatment model: logit max = 8 ------------------------------------------------------------------------------ re78 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- ATET | treat | (1 vs 0) | 2057.937 873.4073 2.36 0.018 346.0906 3769.784 ------------------------------------------------------------------------------ ******************* End Output******************************************** Here is Scott's second question
2. My second question is regarding caliper matching. I have been unsuccessful at estimating caliper matching for -teffects- but was able to do so for -psmatch2- for the same given caliper. I only was successful when I increased the caliper to 0.1. The code for that is below.
There are observations for which no match can be found within the specified caliper distance. As mentioned above, -psmatch- drops these observations and proceeds with the estimation on the remaining subsample. -psmatch2- generates also generates a variable named _support containing a 0 if a match is not found for an observation within the specified caliper and 1 if a match is found. Here is an example ******************* Begin Output******************************************** . * Caliper matching (0.00001) with psmatch2 . psmatch2 treat age agesq agecubed edusq edure74 education married ///
nodegree re74 re75 u74 u75 black hispanic, /// outcome(re78) caliper(0.00001) logit ties
Logistic regression Number of obs = 445 LR chi2(14) = 26.42 Prob > chi2 = 0.0229 Log likelihood = -288.89043 Pseudo R2 = 0.0437 ------------------------------------------------------------------------------ treat | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.4241147 .4098083 -1.03 0.301 -1.227324 .3790947 agesq | .0147783 .0135348 1.09 0.275 -.0117493 .041306 agecubed | -.0001607 .0001423 -1.13 0.259 -.0004397 .0001182 edusq | .0481057 .0238387 2.02 0.044 .0013828 .0948287 edure74 | .0000127 .0000125 1.02 0.308 -.0000117 .0000371 education | -.9525424 .4257927 -2.24 0.025 -1.787081 -.118004 married | .1694692 .2852625 0.59 0.552 -.3896349 .7285734 nodegree | -.4125117 .3923586 -1.05 0.293 -1.18152 .356497 re74 | -.0001822 .0001407 -1.30 0.195 -.0004579 .0000935 re75 | .0000392 .0000505 0.78 0.438 -.0000598 .0001381 u74 | -.216387 .3851458 -0.56 0.574 -.9712588 .5384849 u75 | -.3428689 .3238441 -1.06 0.290 -.9775917 .2918538 black | -.2541281 .3697488 -0.69 0.492 -.9788225 .4705663 hispanic | -.9218369 .5180363 -1.78 0.075 -1.937169 .0934956 _cons | 9.031523 4.742779 1.90 0.057 -.2641534 18.3272 ------------------------------------------------------------------------------ There are observations with identical propensity score values. The sort order of the data could affect your results. Make sure that the sort order is random before calling psmatch2. ---------------------------------------------------------------------------------------- Variable Sample | Treated Controls Difference S.E. T-stat ----------------------------+----------------------------------------------------------- re78 Unmatched | 6349.1435 4554.80112 1794.34238 632.853392 2.84 ATT | 5257.79482 3951.72019 1306.07463 1187.7825 1.10 ----------------------------+----------------------------------------------------------- Note: S.E. does not take into account that the propensity score is estimated. psmatch2: | psmatch2: Common Treatment | support assignment | Off suppo On suppor | Total -----------+----------------------+---------- Untreated | 0 260 | 260Treated | 130 55 | 185 -----------+----------------------+---------- Total | 130 315 | 445 ******************* End Output********************************************
Let's look at _support created by -psmatch2-. ******************* Begin Output******************************************** . label list _support _support: 0 Off support 1 On support . count if _support 315 ******************* End Output******************************************** -teffects- will refuse to proceed, because there are observations that violate the specified caliper condition. ******************* Begin Output******************************************** . capture noisily teffects psmatch (re78) (treat age agesq agecubed edusq ///
edure74 education married nodegree re74 re75 u74 u75 black /// hispanic, logit), atet gen(cstub) caliper(0.00001) vce(iid)
no propensity-score matches for observation 1 within caliper 1e-05; this is not allowed . list _support in 1 +-------------+ | _support | |-------------| 1. | Off support | +-------------+ ******************* End Output******************************************** The same occurs for the other caliper values. In order for the two commands to produce the same results, both the propensity score model and the matching on the estimated propensity score must be run on the same sample.
3. I have a question regarding the interpretation in teffects overlap. Am I correct that the propensity score is being estimated as the probability of being in the control group (as opposed to the treatment group)? The caption in the default overlap graph makes it seem that way. This seems like an innovation -- I have never seen anyone present the propensity score that way and was just curious why -teffects- does it.
-teffects overlap- computes the propensity score for the first level listed in e(tlevels), by default. Use the -ptlevel()- option to change this behavior. We hope that this discussion helps. -David Drukker -Rich Gates ddrukker@stata.com rgates@stata.com References ---------- Abadie, A., & Imbens, G. W. (2006). Large Sample Properties of Matching Estimators for Average Treatment Effects. Econometrica, 74(1), 235–267.