[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: logistic tranformation, proportion variables

From   "Verkuilen, Jay" <>
To   <>
Subject   st: RE: logistic tranformation, proportion variables
Date   Thu, 13 Dec 2007 15:33:15 -0500

Marck Bulter wrote:

<<I have a question that is not entirely related to Stata. Do hope that 
you forgive me. Assume the following model,

*ivreg* pstrmon price maturity age coupon pstrmonprev pstrprev intrest 
ivol compl (precmon = precmonprev)

Where pstrmon, pstrmonprev, precmon and precmonprev are all

Some points:

(1) There is a very large literature in geostatistics on proportions
data summarized in the excellent book by John Aitchison, Compsitional
Data Analysis. The zeros problem is, of course, a real issue and this
literature has dealt with the problem to some degree. There are a few
articles by econometricians that might be of use to you. The references
are in the 2003 edition of Aitchison's book (which was originally
published in 1986).

(Shameless self-promotion: The next two points involve work I'm
currently engaged in.) 

(2) Michael Smithson of ANU and I published an article on using beta
regression for these kinds of data published last year in Psychological
Methods (the APA's methodology journal). It can be found here:
l We talk about the zeros problem quite a bit, though see below.
Independently Maarten Buis wrote some Stata software that estimates this
model. If you have need of mixed model analysis, contact me as Mike and
I have worked out the details and, indeed, use some data very much like
your own as a test case that Mike got from an economist friend of his at
ANU. We are in the process of writing this paper up but it's really not
ready for readers or I'd send it to you. 

(3) A friend from grad school, Clint Stober, and I are in the middle of
writing a paper on the zeros problem in bounded data. It turns out that
depending on the distribution you use, the estimation problem can be
affected horribly or hardly at all by what you do with the boundary
observations. The actual characterization of distributions depends on a
bunch of differential geometry which, fortunately, my friend understands
very well and I have only a rudimentary "statistician's grasp" of.
Essentially it comes down to characteristics of the log-likelihood
function near the boundaries of the sample space, in particular the
nature of the derivatives. In some cases, replacing an exact 0 with
epsilon has little effect on the estimation and in other cases it causes
major damage to standard errors and coefficient estimates. This can
happen with only a few cases out of several hundred observations.
Unfortunately transformations of the normal distribution are badly
affected by this phenomenon. I have a real example with six exact 0
observations out of about 300 cases, where analysis by the lognormal
distribution fails horribly and analysis by the gamma distribution is
not affected at all.) 

Someone else in the thread already noted that an exact 0 or 1 may be
qualitatively different than epsilon. If this is the case the analysis I
just mentioned does not apply. It only applies to the (very common case)
that a 0 is due to limitations of the measurement device, rounding
error, or the like. 

If you have a good dataset involving this kind of problem, I would
definitely be interested! 

J. Verkuilen
Assistant Professor of Educational Psychology
City University of New York-Graduate Center
365 Fifth Ave.
New York, NY 10016
Office: (212) 817-8286 
FAX: (212) 817-1516
Cell: (217) 390-4609

*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index