"Don't do it" is not perhaps the advice you wanted, but I'm
with Maarten on this one.
However, a complete answer should include a reference to -lnskew0-,
although it's only part of the answer you seek.
And, even, more importantly, note that -glm- with a log link
can work well with some zeros in the data.
However, no transform in this terrain will remove a spike in
a distribution, if that is what you have.
Nick
n.j.cox@durham.ac.uk
Maarten buis
> I have two remarks:
> First, the distribution of the dependent variable isn't relevant, the
> distribution of the errors is. So the fact that the dependent variable
> is skewed is no reason to transform that variable, only inspection of
> the residuals is. Have a look -help regress postestimation-
> for lots of useful commands.
>
> Second, if you want to automatically search for some transformation of
> the dependent variable that -boxcox- is already preprogrammed saving
> you some trouble. However, I don't like these automated search
> techniques: they tend to make people stop thinking for themselves. For
> instance, if you keep the dependent variable as is, you think that a
> unit change in your explanatory variable causes a given number of
> dollars change in out of pocket spending. However if you log transform
> the dependent variable, you think that out of pocket spending changes
> by some given percentage for a unit change in the explanatory
> variable.
> Choosing between these two on substantive grounds would be more
> satisfactory for me.
paul d jacobs
> > I am working with health data (MEPS) where
> > out-of-pocket medical expenses (OOP) are a dependent
> > variable in an OLS regression. Because of the
> > positive skewness of such a variable, I would like to
> > use a normalizing transformation, i.e. the log of OOP.
> > However, because of the many zero observations for
> > OOP, the options are to either add a constant to OOP,
> > (some have used $1 arbitrarily), or to model the data
> > separately for the zeroes and the positive values,
> > which I'd rather not do. (I have also considered the
> > square root transformation, etc., but would like to
> > test out the results using a log-constant).
> >
> > My question is: do you know of a method for searching
> > for the optimal constant to add to a variable so that
> > a log-transformation produces the optimal result? Deb
> > et al. (2005), suggest a 'grid search' for this value
> > (see link below for document). I know that grid
> > searches are used in the context of maximum
> > likelihood; is this a similar process? Would running
> > the model with different values and comparing R2s and
> > standard errors be more appropriate?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/