[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Stephen P. Jenkins" <stephenj@essex.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: Opinions on fractional logit versus tobit - prediction and model fit |

Date |
Fri, 3 Apr 2009 09:47:26 +0100 |

====================== ------------------------------ Date: Thu, 2 Apr 2009 19:27:39 +0100 From: Eva Poen <eva.poen@gmail.com> Subject: st: Opinions on fractional logit versus tobit - prediction and model fit <> I'm looking at different ways to model my outcome variable, which is bounded between zero and one (zero and 20, actually, but I don't mind modelling the fraction). It's panel data, and I would like to model individual heterogeneity in the form of random effects (both random intercepts and random slopes). There are a lot of observations at zero and one, respectively. I'm reasonably confident that the random effects are independent of the other variables in the model. So far I have been looking at the fractional logit model, as introduced by Papke and Wooldrigde in their 1996 Journal of Applied Econometrics paper. I use -gllamm- to estimate a model with random effects. I have also been looking at the tobit model, which I again estimate using -gllamm- with random effects. I have a few doubts about the fractional logit model (FLM), and would like to hear other people's opinion: - - Although it appears to be a very elegant solution, some people say that FLM is not well suited for problems with a lot of zeros or ones; for example, Maarten Buis said so in this post (but didn't provide a reference): http://www.stata.com/statalist/archive/2007-07/msg00786.html If someone knows any references where this is discussed, I'd be grateful to receive them. - - Since FLM is quasi-likelihood, any likelihood-based approaches to model fit are ruled out. For the tobit model I can use those measures. The only other option I can think of for FLM is to compare predicted values with actual values. However, do predicted values in FLM make sense? We know that the distributional assumption is not true. So I'm wondering how meaningful predicted values are in this context. - - I am getting sensible estimates for the random effects with the tobit approach, and not so sensible ones with FLM. In fact, FLM estimates two of the three to be zero. Is this a sign of my model being incorrectly specified, or could it be a sign of FLM not handling the zeros and ones very well? Many thanks, Eva ====================== For an extension of the FLM to the panel data setting, see also "Panel data methods for fractional response variables with an application to test pass rates" by Leslie E. Papke and Jeffrey M. Wooldridge, Journal of Econometrics 145 (2008) 121-133 They refer to Stata in their estimation section. Probit rather than logit is used, for the reasons they explain. ("pooled fractional probit" (PFP) estimator.) It appears relatively straightforward to implement, at least in the case of strictly exogenous regressors. -glm- with clustered standard errors. The case with endogenous RHS vbles seems more complicated, but they lay out a strategy that doesn't seem impossible for mortals (two-step; bootstrapping to adjust SE) Stephen ------------------------------------------------------------- Professor Stephen P. Jenkins <stephenj@essex.ac.uk> Director, Institute for Social and Economic Research University of Essex, Colchester CO4 3SQ, U.K. Tel: +44 1206 873374. Fax: +44 1206 873151. http://www.iser.essex.ac.uk Survival Analysis using Stata: http://www.iser.essex.ac.uk/iser/teaching/module-ec968 Downloadable papers and software: http://ideas.repec.org/e/pje7.html Learn about the UK's new household panel survey, "Understanding Society": http://www.understandingsociety.org.uk/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**AW: st: How to balance an unbalanced panel data set** - Next by Date:
**Re: st: IV estimation for probit models with binary endogenous variable...?** - Previous by thread:
**Re: st: Opinions on fractional logit versus tobit - prediction and model fit** - Next by thread:
**Re: st: How to treat variables where all outcomes happens in one interval Roland- When categories with events are compared to categories with no events in a Cox model, the partial likelihood is maximized by a HR of infinity, giving you the "very large HR" you observed. The same phenomenon occurs if you estimate the odds ratio in a 2 x 2 table with no observations in either of the off-diagonal cells. If you wish to use Cox, you cannot compare age >45 to age <=45. Your definition of stages is not very clear, but you cannot make any comparison of stages where membership in one requires age<=45. You may have to exclude all people <=45 and take what stage definitions remain remains. You may still analyze or adjust for differences among other stages, confined to those >45. If you can obtain from the literature information about the distribution of deaths by age, a sample size calculation (-stpower-) should show why you observed none in the <=45 group. -Steve On Mar 31, 2009, at 4:56 AM, roland andersson wrote: I am analysing survival in two methods of syrgery for thyroid cancer. The international classification of stage of disease includes tumorsize (<2, 2-4, >4 cm within the thyroid and growth outside the thyroid, presence of distant metastases, metastases to lymphglands and age>45 years. In my patients all deaths have occured in patients >age 45 years. When the dichotomised agevariable is analysed in Coxregression the HR is very large with very large SE. There is no problem with collinearity. How should I treat this situation? One solution would be to only analyse according to the stage classification (which includes age >45 years for stage 3 and 4), but I would like to analyse the importance of each element of the stageclassification. I may dichotomise with cutoff point >50 years, but that is not correct according to the international definition of tumour stage.** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |