Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: teffects, caliper, propensity score matching


From   "David M. Drukker" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: teffects, caliper, propensity score matching
Date   Tue, 4 Mar 2014 09:25:50 -0600 (CST)

Scott Cunningham <[email protected]> posted several questions regarding
-teffects psmatch- on Friday, 28 February.  We apologize for the delay, one
of us has been traveling.

Here are the short versions of the questions and the answers.  We discuss
the details below.

Scott's first question was about how to replicate results from -psmatch2-
using -teffects-.  The answer is to use the -ties- option in -psmatch2-.
-psmatch2- drops ties, while -teffects- keeps the ties following the
recommendation of Abadie and Imbens (2006).

Scott's second question was about how to replicate the results from
-psmatch2- using -teffects- with caliper matching.  Caliper matching
requires that each observation have a match within the specified caliper
distance.  -psmatch2- automatically drops observations for which no match
within the caliper distance can be found.  Dropping these observations
changes the population parameter.  -teffects- refuses to proceed so that you
can choose how to identify a feasible parameter.

Scott's third question pertains to which treatment level is the base
category for the overlap plot.  By default, the first treatment level is the
base category, as discussed below.

We now discuss Scott's questions in detail.

1.  My first question is regarding the comparability of teffects psmatch and
psmatch2.  I have been unable to successfully replicate psmatch2 results
using teffects. The two seemingly identical commands yield very different
treatment effect estimates.  -teffects- gives me an estimate of 730.38, but
psmatch2 age me a return of 951.  My understanding is that both used logit to
estimate propensity score, both used nearest neighbor(1) to find nearest
neighbor.  So I am at a loss to explain why they are different.

We begin by downloading the data and creating variables that Scott used.

. use http://users.nber.org/~rdehejia/data/nsw_dw.dta
. . generate double agesq = age*age
. generate double agecubed = age*age*age
. generate double edusq    = educ*educ
. generate byte u74        = (re74==0)
. generate byte u75        = (re75==0)
. generate double edure74  = educ*re74
. save nsw_dw, replace

-teffects psmatch- includes all tied matches.  To obtain the same results
from -psmatch2- specify the -ties- option.  Here is an example

******************* Begin Output********************************************
. psmatch2 treat age agesq agecubed edusq edure74 education married ///
        nodegree re74 re75 u74 u75 black hispanic,                ///
        outcome(re78) logit ties neighbor(1) ate

Logistic regression                               Number of obs   =        445
                                                  LR chi2(14)     =      26.42
                                                  Prob > chi2     =     0.0229
Log likelihood = -288.89043                       Pseudo R2       =     0.0437

------------------------------------------------------------------------------
       treat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.4241147   .4098083    -1.03   0.301    -1.227324    .3790947
       agesq |   .0147783   .0135348     1.09   0.275    -.0117493     .041306
    agecubed |  -.0001607   .0001423    -1.13   0.259    -.0004397    .0001182
       edusq |   .0481057   .0238387     2.02   0.044     .0013828    .0948287
     edure74 |   .0000127   .0000125     1.02   0.308    -.0000117    .0000371
   education |  -.9525424   .4257927    -2.24   0.025    -1.787081    -.118004
     married |   .1694692   .2852625     0.59   0.552    -.3896349    .7285734
    nodegree |  -.4125117   .3923586    -1.05   0.293     -1.18152     .356497
        re74 |  -.0001822   .0001407    -1.30   0.195    -.0004579    .0000935
        re75 |   .0000392   .0000505     0.78   0.438    -.0000598    .0001381
         u74 |   -.216387   .3851458    -0.56   0.574    -.9712588    .5384849
         u75 |  -.3428689   .3238441    -1.06   0.290    -.9775917    .2918538
       black |  -.2541281   .3697488    -0.69   0.492    -.9788225    .4705663
    hispanic |  -.9218369   .5180363    -1.78   0.075    -1.937169    .0934956
       _cons |   9.031523   4.742779     1.90   0.057    -.2641534     18.3272
------------------------------------------------------------------------------

There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.

-------------------------------------------------------------------------------
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
----------------------------+-----------------------------------------------------------
            re78  Unmatched |  6349.1435   4554.80112   1794.34238   632.853392     2.84
                        ATT |  6349.1435   4291.20612   2057.93739   873.982463     2.35
                        ATU | 4554.80112   6355.40981   1800.60869            .        .
                        ATE |                           1907.58803            .        .
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.

           | psmatch2:
 psmatch2: |   Common
 Treatment |  support
assignment | On suppor |     Total
-----------+-----------+----------
 Untreated |       260 |       260
Treated | 185 | 185 -----------+-----------+----------
     Total |       445 |       445

******************* End   Output********************************************

Note that the estimated ATE is 1907.58803.

Now we replicate the ATE estimate using -teffects-.

******************* Begin Output********************************************
. teffects psmatch (re78) ///
        (treat age agesq agecubed edusq edure74 education married ///
        nodegree re74 re75 u74 u75 black hispanic, logit)

Treatment-effects estimation                    Number of obs      =       445
Estimator      : propensity-score matching      Matches: requested =         1
Outcome model  : matching                                      min =         1
Treatment model: logit                                         max =         8
------------------------------------------------------------------------------
             |              AI Robust
        re78 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATE          |
       treat |
   (1 vs 0)  |   1907.588   879.8298     2.17   0.030     183.1534    3632.023
------------------------------------------------------------------------------
******************* End   Output********************************************

Specifying the -ties- option on -psmatch2- also causes it to produce the same
estimate for the average treatment effect on the treated (ATET).
(In the command below, we did not specify the -ate- option causing
-psmatch2- to estimate the ATET.)

******************* Begin Output********************************************
. . * psmatch2 results for the Average treatment effect for the treatment . * group (here, ATT)
. psmatch2 treat age agesq agecubed edusq edure74 education married ///
        nodegree re74 re75 u74 u75 black hispanic, ///
        outcome(re78) logit ties neighbor(1)

[Output Omitted]
******************* End   Output********************************************

produces an estimated ATET of 2057.937.

******************* Begin Output********************************************
. teffects psmatch (re78) ///
        (treat age agesq agecubed edusq edure74 education married ///
        nodegree re74 re75 u74 u75 black hispanic, logit), ///
        atet vce(iid)

Treatment-effects estimation                    Number of obs      =       445
Estimator      : propensity-score matching      Matches: requested =         1
Outcome model  : matching                                      min =         1
Treatment model: logit                                         max =         8
------------------------------------------------------------------------------
        re78 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ATET         |
       treat |
   (1 vs 0)  |   2057.937   873.4073     2.36   0.018     346.0906    3769.784
------------------------------------------------------------------------------
******************* End   Output********************************************

Here is Scott's second question

2.  My second question is regarding caliper matching.  I have been
unsuccessful at estimating caliper matching for -teffects- but was able to do
so for -psmatch2- for the same given caliper.  I only was successful when I
increased the caliper to 0.1.  The code for that is below.

There are observations for which no match can be found within the specified
caliper distance.  As mentioned above, -psmatch- drops these observations
and proceeds with the estimation on the remaining subsample.  -psmatch2-
generates also generates a variable named _support containing a 0 if a match
is not found for an observation within the specified caliper and 1 if a
match is found.

Here is an example

******************* Begin Output********************************************
. * Caliper matching (0.00001) with psmatch2
. psmatch2 treat age agesq agecubed edusq edure74 education married ///
        nodegree re74 re75 u74 u75 black hispanic, ///
        outcome(re78) caliper(0.00001) logit ties

Logistic regression                               Number of obs   =        445
                                                  LR chi2(14)     =      26.42
                                                  Prob > chi2     =     0.0229
Log likelihood = -288.89043                       Pseudo R2       =     0.0437

------------------------------------------------------------------------------
       treat |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -.4241147   .4098083    -1.03   0.301    -1.227324    .3790947
       agesq |   .0147783   .0135348     1.09   0.275    -.0117493     .041306
    agecubed |  -.0001607   .0001423    -1.13   0.259    -.0004397    .0001182
       edusq |   .0481057   .0238387     2.02   0.044     .0013828    .0948287
     edure74 |   .0000127   .0000125     1.02   0.308    -.0000117    .0000371
   education |  -.9525424   .4257927    -2.24   0.025    -1.787081    -.118004
     married |   .1694692   .2852625     0.59   0.552    -.3896349    .7285734
    nodegree |  -.4125117   .3923586    -1.05   0.293     -1.18152     .356497
        re74 |  -.0001822   .0001407    -1.30   0.195    -.0004579    .0000935
        re75 |   .0000392   .0000505     0.78   0.438    -.0000598    .0001381
         u74 |   -.216387   .3851458    -0.56   0.574    -.9712588    .5384849
         u75 |  -.3428689   .3238441    -1.06   0.290    -.9775917    .2918538
       black |  -.2541281   .3697488    -0.69   0.492    -.9788225    .4705663
    hispanic |  -.9218369   .5180363    -1.78   0.075    -1.937169    .0934956
       _cons |   9.031523   4.742779     1.90   0.057    -.2641534     18.3272
------------------------------------------------------------------------------
There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.
----------------------------------------------------------------------------------------
        Variable     Sample |    Treated     Controls   Difference         S.E.   T-stat
----------------------------+-----------------------------------------------------------
            re78  Unmatched |  6349.1435   4554.80112   1794.34238   632.853392     2.84
                        ATT | 5257.79482   3951.72019   1306.07463    1187.7825     1.10
----------------------------+-----------------------------------------------------------
Note: S.E. does not take into account that the propensity score is estimated.

 psmatch2: |   psmatch2: Common
 Treatment |        support
assignment | Off suppo  On suppor |     Total
-----------+----------------------+----------
 Untreated |         0        260 |       260
Treated | 130 55 | 185 -----------+----------------------+---------- Total | 130 315 | 445 ******************* End Output********************************************


Let's look at _support created by -psmatch2-.

******************* Begin Output********************************************
. label list _support
_support:
           0 Off support
           1 On support

. count if _support
  315
******************* End   Output********************************************


-teffects- will refuse to proceed, because there are observations that
violate the specified caliper condition.

******************* Begin Output********************************************
. capture noisily teffects psmatch (re78) (treat age agesq agecubed edusq ///
        edure74 education married nodegree re74 re75 u74 u75 black      ///
        hispanic, logit), atet gen(cstub) caliper(0.00001) vce(iid)
no propensity-score matches for observation 1 within caliper 1e-05; this is not allowed

. list _support in 1

     +-------------+
     |    _support |
     |-------------|
  1. | Off support |
     +-------------+
******************* End   Output********************************************

The same occurs for the other caliper values.

In order for the two commands to produce the same results, both the
propensity score model and the matching on the estimated propensity score
must be run on the same sample.

3.  I have a question regarding the interpretation in teffects overlap.  Am I
correct that the propensity score is being estimated as the probability of
being in the control group (as opposed to the treatment group)?  The caption
in the default overlap graph makes it seem that way.  This seems like an
innovation -- I have never seen anyone present the propensity score that way
and was just curious why -teffects- does it.

-teffects overlap- computes the propensity score for the first level listed in
e(tlevels), by default.  Use the -ptlevel()- option to change this behavior.

We hope that this discussion helps.

-David Drukker             -Rich Gates
[email protected]         [email protected]

References
----------

Abadie, A., & Imbens, G. W. (2006). Large Sample Properties of Matching
Estimators for Average Treatment Effects. Econometrica, 74(1), 235–267.


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index