# st: Opinions on fractional logit versus tobit - prediction and model fit

 From "Stephen P. Jenkins" To Subject st: Opinions on fractional logit versus tobit - prediction and model fit Date Fri, 3 Apr 2009 09:47:26 +0100

```======================
------------------------------

Date: Thu, 2 Apr 2009 19:27:39 +0100
From: Eva Poen <eva.poen@gmail.com>
Subject: st: Opinions on fractional logit versus tobit - prediction
and model fit

<>

I'm looking at different ways to model my outcome variable, which is
bounded between zero and one (zero and 20, actually, but I don't mind
modelling the fraction). It's panel data, and I would like to model
individual heterogeneity in the form of random effects (both random
intercepts and random slopes). There are a lot of observations at zero
and one, respectively. I'm reasonably confident that the random
effects are independent of the other variables in the model.

So far I have been looking at the fractional logit model, as
introduced by Papke and Wooldrigde in their 1996 Journal of Applied
Econometrics paper. I use -gllamm- to estimate a model with random
effects. I have also been looking at the tobit model, which I again
estimate using -gllamm- with random effects.

I have a few doubts about the fractional logit model (FLM), and would
like to hear other people's opinion:

- - Although it appears to be a very elegant solution, some people say
that FLM is not well suited for problems with a lot of zeros or ones;
for example, Maarten Buis said so in this post (but didn't provide a
reference):
http://www.stata.com/statalist/archive/2007-07/msg00786.html
If someone knows any references where this is discussed, I'd be

- - Since FLM is quasi-likelihood, any likelihood-based approaches to
model fit are ruled out. For the tobit model I can use those measures.
The only other option I can think of for FLM is to compare predicted
values with actual values. However, do predicted values in FLM make
sense? We know that the distributional assumption is not true. So I'm
wondering how meaningful predicted values are in this context.

- - I am getting sensible estimates for the random effects with the
tobit approach, and not so sensible ones with FLM. In fact, FLM
estimates two of the three to be zero. Is this a sign of my model
being incorrectly specified, or could it be a sign of FLM not handling
the zeros and ones very well?

Many thanks,
Eva
======================

For an extension of the FLM to the panel data setting, see also

"Panel data methods for fractional response variables with an
application to test
pass rates" by Leslie E. Papke and Jeffrey M. Wooldridge, Journal of
Econometrics 145 (2008) 121-133

They refer to Stata in their estimation section. Probit rather than
logit is used, for the reasons they explain. ("pooled fractional
probit" (PFP) estimator.)

It appears relatively straightforward to implement, at least in the
case of strictly exogenous regressors. -glm- with clustered standard
errors.  The case with endogenous RHS vbles seems more complicated,
but they lay out a strategy that doesn't seem impossible for mortals

Stephen
-------------------------------------------------------------
Professor Stephen P. Jenkins <stephenj@essex.ac.uk>
Director, Institute for Social and Economic Research
University of Essex, Colchester CO4 3SQ, U.K.
Tel: +44 1206 873374.  Fax: +44 1206 873151.
http://www.iser.essex.ac.uk
Survival Analysis using Stata:
http://www.iser.essex.ac.uk/iser/teaching/module-ec968

Learn about the UK's new household panel survey, "Understanding
Society": http://www.understandingsociety.org.uk/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```