[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Allan Garland" <[email protected]> |

To |
<[email protected]> |

Subject |
st: Quantile Regression with a skewed and zero-inflated dependent variable? |

Date |
Mon, 4 Aug 2008 09:03:49 -0500 |

I am working on a problem that involves multivariable modeling of: Y=represents a time delay that is not only right-skewed but also has a fairly large probability mass at 0 (i.e. 13% of subjects have Y=0). In particular, I'm interested in the independent varibles associated with unusually long values of Y. So, I decided to create a QR model of the 90th conditional percentile of Y. I did not use a logistic regression approach (after dichotomizing Y at some arbitrary unconditional cutpoint that represents a "long" delay) because of the known problems with that approach (MacCallum R, Zhang S, Preacher K, Rucker D. On the practice of dichotomization of quantitative variables. Psychological Methods 2002;7(1):19-40). Here are 2 of the reviewer's comments for this paper: 1. The real virtue of quantile regression, as argued by its author, is to explore covariate effect by estimating an entire family of conditional quantile functions, albeit this has an implicit ordinal aspect [R. Koenker and K. F. Hallock. Quantile regression. Journal of Economic Perspectives 15 (4):143-156, 2001]. There may be heuristic value in using a selective quantile regression, but this would seem to reproduce the problem of logistic regression at a different level. Moreover, quantile regression would presumably share the difficulty of linear regression in explicitly modelling covariate effect at zero probability. Such is not the case with zero-adjusted estimators within the GLM family, as below. 2. The authors could consider (i) a count-data approach [Y could be expressed in integer hours; fractional hours may be subject to measurement error] and the various zero-inflated count estimators available in Stata or (ii) for a continuous data approach, modelling via zero-adjusted estimators within generalized linear models (GLM), using, say, the inverse-Gaussian or gamma distribution both of which have found utility in modelling skewed distributions [P. de Jong and G. Z. Heller. Generalized Linear Models for Insurance Data, Cambridge, UK:Cambridge University Press, 2008]. My question is: Is he correct? Specifically, I am uncertain about the validity of the criticisms of using QR that he raises in #1. I don't dispute (as he indicates in #2) that alternative statistical approaches are available for this question, but I still believe that a model of the 90th percentile is legitimate approach to this question for this data set and variable distribution. Thus, I'd appreciate anyone's thoughts on: (a) the reviewer's criticisms of using QR for this purpose and in this manner, and (b) the "best" approach to this multivariable modeling problem. Thanks, Allan ------------------------------------------------------------------------ -- Allan Garland, MD, MA Associate Professor of Medicine & Community Health Sciences University of Manitoba Health Sciences Center - GF 222 820 Sherbrook Street Winnipeg, Manitoba R3A 1R9 phone: 204-787-1198 page: 204-935-2166 fax: 204-787-1087 email: [email protected]

This email and/or any documents in this transmission is intended for the addressee(s) only and may contain legally privileged or confidential information. Any unauthorized use, disclosure, distribution, copying or dissemination is strictly prohibited. If you receive this transmission in error, please notify the sender immediately and return the original. Ce courriel et tout document dans cette transmission est destiné à la personne ou aux personnes à qui il est adressé. Il peut contenir des informations privilégiées ou confidentielles. Toute utilisation, divulgation, distribution, copie, ou diffusion non autorisée est strictement défendue. Si vous n'êtes pas le destinataire de ce message, veuillez en informer l'expéditeur immédiatement et lui remettre l'original.

- Prev by Date:
**RE: st: about ivprobit** - Next by Date:
**st: Convergence parameter** - Previous by thread:
**st: Multivariate probit postestimation** - Next by thread:
**st: Convergence parameter** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |