Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Grid Search in a Log Plus Constant Model

From   Sebastian Baumeister <>
Subject   Re: st: Grid Search in a Log Plus Constant Model
Date   Thu, 09 Nov 2006 09:14:02 +0100

hi paul,
i am working on a similar problem and the literature (mostly by Manning and Mullahy) suggests two things. 1. OLS on ln(costs): costs are often logged to reduce skewness, but this necessitates a retransformation to put the estimates back on the original (unlogged) scale. This retransformation can lead to biases if heteroscedasticity is present on the log scale - thus requiring "smearing" (Manning 1998). Alternatively, some have recommended using generalized linear models (with gamma distribution and log link), which do not require retransformations of the cost dependent variable after estimation (Buntin et al. 2005, Manning et al 2005).

Manning WG. The logged dependent variable, heteroscedasticity, and the retransformation problem. J Health Econ 1998;17:283-95.
Manning WG, Basu A, Mullahy J. Generalized modeling approaches to risk adjustment of skewed outcomes data. J Health Econ 2005;24:465-88.
Buntin MB, Zaslavsky AM. Too much ado about two-part models and transformation? Comparing methods of modeling Medicare expenditures. J Health Econ 2004;23:525-42.

paul d jacobs schrieb:

I am working with health data (MEPS) where
out-of-pocket medical expenses (OOP) are a dependent
variable in an OLS regression. Because of the
positive skewness of such a variable, I would like to
use a normalizing transformation, i.e. the log of OOP.
However, because of the many zero observations for
OOP, the options are to either add a constant to OOP,
(some have used $1 arbitrarily), or to model the data
separately for the zeroes and the positive values,
which I'd rather not do. (I have also considered the
square root transformation, etc., but would like to
test out the results using a log-constant).

My question is: do you know of a method for searching
for the optimal constant to add to a variable so that
a log-transformation produces the optimal result? Deb
et al. (2005), suggest a 'grid search' for this value
(see link below for document). I know that grid
searches are used in the context of maximum
likelihood; is this a similar process? Would running
the model with different values and comparing R2s and
standard errors be more appropriate?

Thanks very much for your time!

Paul Jacobs
Ph.D. Candidate, Economics
American University

Link to Deb, et al presentation:

Sponsored Link

Get a free Motorola Razr! Today Only! Choose Cingular, Sprint, Verizon, Alltel, or T-Mobile.
* For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index