[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
jverkuilen <jverkuilen@gc.cuny.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Quantile Regression with a skewed and zero-inflateddependent variable? |

Date |
Mon, 4 Aug 2008 11:02:59 -0400 |

A few things: (1) QR doesn't like ties so that is where zero inflation gets nasty. But you aren't modeling the lower tail and QR doesn't consider the magnitude of discrepancies if I recall correctly, just the signs. Why not model say the median, the 75% point and the 90% point? (i.e., throw the reviewer a nice juicy bone.) As to whether it turns into a logistic regression problem when you model any given quantile, I don't think so but that would be resolved by considering the likelihood functions and my copy of Koenker's book is not here. (2) You can run a non integer response thru zinb or zip. Stata will complain but it will give you answers that aren't nuts... usually. I have done this to impose a flattening constant in an nbreg I ran a while back. (3) You could make a zi model for the gamma or ig(?---not sure) using -gllamm-. Partha Deb's mixture program could also do that, I believe. Then have two classes, one a degenerate (or near-degenerate) distribution and the other free. Worth a try. -----Original Message----- From: "Allan Garland" <agarland@exchange.hsc.mb.ca> To: statalist@hsphsun2.harvard.edu Sent: 8/4/2008 10:03 AM Subject: st: Quantile Regression with a skewed and zero-inflated dependent variable? I am working on a problem that involves multivariable modeling of: Y=represents a time delay that is not only right-skewed but also has a fairly large probability mass at 0 (i.e. 13% of subjects have Y=0). In particular, I'm interested in the independent varibles associated with unusually long values of Y. So, I decided to create a QR model of the 90th conditional percentile of Y. I did not use a logistic regression approach (after dichotomizing Y at some arbitrary unconditional cutpoint that represents a "long" delay) because of the known problems with that approach (MacCallum R, Zhang S, Preacher K, Rucker D. On the practice of dichotomization of quantitative variables. Psychological Methods 2002;7(1):19-40). Here are 2 of the reviewer's comments for this paper: 1. The real virtue of quantile regression, as argued by its author, is to explore covariate effect by estimating an entire family of conditional quantile functions, albeit this has an implicit ordinal aspect [R. Koenker and K. F. Hallock. Quantile regression. Journal of Economic Perspectives 15 (4):143-156, 2001]. There may be heuristic value in using a selective quantile regression, but this would seem to reproduce the problem of logistic regression at a different level. Moreover, quantile regression would presumably share the difficulty of linear regression in explicitly modelling covariate effect at zero probability. Such is not the case with zero-adjusted estimators within the GLM family, as below. 2. The authors could consider (i) a count-data approach [Y could be expressed in integer hours; fractional hours may be subject to measurement error] and the various zero-inflated count estimators available in Stata or (ii) for a continuous data approach, modelling via zero-adjusted estimators within generalized linear models (GLM), using, say, the inverse-Gaussian or gamma distribution both of which have found utility in modelling skewed distributions [P. de Jong and G. Z. Heller. Generalized Linear Models for Insurance Data, Cambridge, UK:Cambridge University Press, 2008]. My question is: Is he correct? Specifically, I am uncertain about the validity of the criticisms of using QR that he raises in #1. I don't dispute (as he indicates in #2) that alternative statistical approaches are available for this question, but I still believe that a model of the 90th percentile is legitimate approach to this question for this data set and variable distribution. Thus, I'd appreciate anyone's thoughts on: (a) the reviewer's criticisms of using QR for this purpose and in this manner, and (b) the "best" approach to this multivariable modeling problem. Thanks, Allan ------------------------------------------------------------------------ -- Allan Garland, MD, MA Associate Professor of Medicine & Community Health Sciences University of Manitoba Health Sciences Center - GF 222 820 Sherbrook Street Winnipeg, Manitoba R3A 1R9 phone: 204-787-1198 page: 204-935-2166 fax: 204-787-1087 email: agarland@hsc.mb.ca * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: Storing of Confidence Level and Standard Error estimates in Stata** - Next by Date:
**st: RE: list command** - Previous by thread:
**st: list command** - Next by thread:
**st: Aggregate date to calculate n months** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |