Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Grid Search in a Log Plus Constant Model


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Grid Search in a Log Plus Constant Model
Date   Wed, 8 Nov 2006 22:28:28 -0000

"Don't do it" is not perhaps the advice you wanted, but I'm 
with Maarten on this one. 

However, a complete answer should include a reference to -lnskew0-, 
although it's only part of the answer you seek. 

And, even, more importantly, note that -glm- with a log link 
can work well with some zeros in the data. 

However, no transform in this terrain will remove a spike in 
a distribution, if that is what you have. 

Nick 
n.j.cox@durham.ac.uk 

Maarten buis
 
> I have two remarks:
> First, the distribution of the dependent variable isn't relevant, the
> distribution of the errors is. So the fact that the dependent variable
> is skewed is no reason to transform that variable, only inspection of
> the residuals is. Have a look -help regress postestimation- 
> for lots of useful commands.
> 
> Second, if you want to automatically search for some transformation of
> the dependent variable that -boxcox- is already preprogrammed saving
> you some trouble. However, I don't like these automated search
> techniques: they tend to make people stop thinking for themselves. For
> instance, if you keep the dependent variable as is, you think that a
> unit change in your explanatory variable causes a given number of
> dollars change in out of pocket spending. However if you log transform
> the dependent variable, you think that out of pocket spending changes
> by some given percentage for a unit change in the explanatory 
> variable.
> Choosing between these two on substantive grounds would be more
> satisfactory for me.

paul d jacobs 

> > I am working with health data (MEPS) where
> > out-of-pocket medical expenses (OOP) are a dependent
> > variable in an OLS regression.  Because of the
> > positive skewness of such a variable, I would like to
> > use a normalizing transformation, i.e. the log of OOP.
> >  However, because of the many zero observations for
> > OOP, the options are to either add a constant to OOP,
> > (some have used $1 arbitrarily), or to model the data
> > separately for the zeroes and the positive values,
> > which I'd rather not do.  (I have also considered the
> > square root transformation, etc., but would like to
> > test out the results using a log-constant).
> > 
> > My question is:  do you know of a method for searching
> > for the optimal constant to add to a variable so that
> > a log-transformation produces the optimal result?  Deb
> > et al. (2005), suggest a 'grid search' for this value
> > (see link below for document).  I know that grid
> > searches are used in the context of maximum
> > likelihood; is this a similar process?  Would running
> > the model with different values and comparing R2s and
> > standard errors be more appropriate?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index