[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Pierre Azoulay <pierre.azoulay@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: finding a nearest neighbor using psmatch2 --- problem achieving covariate balance |

Date |
Tue, 3 Mar 2009 18:48:43 -0500 |

Dear Statalisters: I am facing a problem that is vexing, in that it should be easy to solve. But I have been stuck on it for a while. I have two groups of scientists. The treatment group has n observations. The control group has about 500*n observations. There is NO common support problem. Think of n in the thousands. We want to find a nearest neighbor for each guy in the treatment group. We want to do this non-parametrically. Though we have many observables, these covariates do not predict at all whether a scientist is treated or control. Therefore a propensity score approach seems ill-advised. [a bit of context: the treatment is having a "superstar collaborator" that dies] We are using psmatch2, using the nearest-neighbor mahalanobis option. We are using 6 variables. let's call them x1, x2,...,x5, and log(y0). psmatch2 treat, mahalanobis(x1 x2 x3 x4 x5 logy0) y0 is the baseline stock of publications for our scientists. It is very skewed. Hence the log transformation, which we thought might improve the match. We care a lot about matching on y0 --- this is basically the lagged dependent variable in our analysis. With so many potential controls to choose from, the outcome of this procedure is very good on x1, x2 through x5. These covariates are very well balanced. Not so in the case of log(y0) or y0. The mean of the treated is higher than that of the matched controls, significantly so. And the problem lies in the right tail. The medians line up exactly. So does the 75th percentile. It's in the top quartile that things go wrong. I said the underlying data does not suffer from a common support problem. There are indeed lots of potential control guys with a stock of pubs/y0 in the tail. At the same time, it is true that the right tail is fatter among treated than in the population of potential controls. We could achieve balance on y0 if we were matching only on that. But we care about x1-x5 too! Does anyone know of a trick (another transformation besides the log?) that might enable us to do better at identifying matches for the guys "in the tail" of the distribution of y0? Ideally, psmatch2 would enable the researcher to specify an exact match on logy0, and a mahalanobis match on x1-x5. But it does not. And I must confess that my limited programing skills do not enable me to muck around the psmatch2 ado-file to create that option. Any suggestion would be appreciated! Sincerely, Pierre ------------------------------------------------------------------- Pierre Azoulay Associate Professor of Strategy Massachusetts Institute of Technology Sloan School of Management 50 Memorial Drive — E52-555 Cambridge, MA 02142-1947 Tel [Sloan]: (617) 258-9766 Tel [NBER]: (617) 588-1464 Fax: (617) 253-2660 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: AW: can I get standardized coefficients by standardize variables first?** - Next by Date:
**st: Reshape-Like Question** - Previous by thread:
**st: Nonlinear model using maximum likelihood** - Next by thread:
**st: Reshape-Like Question** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |