| |
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Grid Search in a Log Plus Constant Model
hi paul,
i am working on a similar problem and the literature (mostly by Manning
and Mullahy) suggests two things. 1. OLS on ln(costs): costs are often
logged to reduce skewness, but this necessitates a retransformation to
put the estimates back on the original (unlogged) scale. This
retransformation can lead to biases if heteroscedasticity is present on
the log scale - thus requiring "smearing" (Manning 1998). Alternatively,
some have recommended using generalized linear models (with gamma
distribution and log link), which do not require retransformations of
the cost dependent variable after estimation (Buntin et al. 2005,
Manning et al 2005).
Manning WG. The logged dependent variable, heteroscedasticity, and the
retransformation problem. J Health Econ 1998;17:283-95.
Manning WG, Basu A, Mullahy J. Generalized modeling approaches to risk
adjustment of skewed outcomes data. J Health Econ 2005;24:465-88.
Buntin MB, Zaslavsky AM. Too much ado about two-part models and
transformation? Comparing methods of modeling Medicare expenditures. J
Health Econ 2004;23:525-42.
paul d jacobs schrieb:
I am working with health data (MEPS) where
out-of-pocket medical expenses (OOP) are a dependent
variable in an OLS regression. Because of the
positive skewness of such a variable, I would like to
use a normalizing transformation, i.e. the log of OOP.
However, because of the many zero observations for
OOP, the options are to either add a constant to OOP,
(some have used $1 arbitrarily), or to model the data
separately for the zeroes and the positive values,
which I'd rather not do. (I have also considered the
square root transformation, etc., but would like to
test out the results using a log-constant).
My question is: do you know of a method for searching
for the optimal constant to add to a variable so that
a log-transformation produces the optimal result? Deb
et al. (2005), suggest a 'grid search' for this value
(see link below for document). I know that grid
searches are used in the context of maximum
likelihood; is this a similar process? Would running
the model with different values and comparing R2s and
standard errors be more appropriate?
Thanks very much for your time!
Paul Jacobs
Ph.D. Candidate, Economics
American University
Link to Deb, et al presentation:
harrisschool.uchicago.edu/faculty/articles/iHEAminicourse.pdf
____________________________________________________________________________________
Sponsored Link
Get a free Motorola Razr! Today Only!
Choose Cingular, Sprint, Verizon, Alltel, or T-Mobile.
http://www.letstalk.com/inlink.htm?to=592913
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/