I was asked off-list to give fuller details of the references
I gave briefly yesterday. Here they are:
Deb and Trivedi (2002) is in the pdf file Paul gave.
Deb, P. and P. K. Trivedi. "The Structure of Demand for Health Care:
Latent Class versus Twopart Models, " Journal of Health Economics 21:
601-625, 2002.
Cameron and Trivedi, 2005, Microeconometrics: Methods and
Applications, Cambridge University Press.
Le
On 11/8/06, Le Wang <[email protected]> wrote:
Paul,
Your problem is very similar to selection problem. Although you
mentioned you would rather not use models for both zeroes and positive
values, I think they are probably the right way to go.
I guess what you refer to "modelling the data separately for the
zeroes and the positive values" is the approaches used in the
literature to solve the selection problem. If so, they are actually
not "modelling separately the data" instead of modelling them
together.
Deb and Trevidi (2002) use two-part model to solve the similar problem
and Cameron and Trivedi (2005) (Section 16.6 p553) discuss
alternatives including bivariate sample selection models. Cameron has
the corresponding Stata codes on his website to implement these
methods. Hope it helps.
Le
On 11/8/06, paul d jacobs <[email protected]> wrote:
> I am working with health data (MEPS) where
> out-of-pocket medical expenses (OOP) are a dependent
> variable in an OLS regression. Because of the
> positive skewness of such a variable, I would like to
> use a normalizing transformation, i.e. the log of OOP.
> However, because of the many zero observations for
> OOP, the options are to either add a constant to OOP,
> (some have used $1 arbitrarily), or to model the data
> separately for the zeroes and the positive values,
> which I'd rather not do. (I have also considered the
> square root transformation, etc., but would like to
> test out the results using a log-constant).
>
> My question is: do you know of a method for searching
> for the optimal constant to add to a variable so that
> a log-transformation produces the optimal result? Deb
> et al. (2005), suggest a 'grid search' for this value
> (see link below for document). I know that grid
> searches are used in the context of maximum
> likelihood; is this a similar process? Would running
> the model with different values and comparing R2s and
> standard errors be more appropriate?
>
> Thanks very much for your time!
>
> Paul Jacobs
> Ph.D. Candidate, Economics
> American University
>
>
> Link to Deb, et al presentation:
> harrisschool.uchicago.edu/faculty/articles/iHEAminicourse.pdf
>
>
>
>
> ____________________________________________________________________________________
> Sponsored Link
>
> Get a free Motorola Razr! Today Only!
> Choose Cingular, Sprint, Verizon, Alltel, or T-Mobile.
> http://www.letstalk.com/inlink.htm?to=592913
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Le Wang, Ph.D.
Minnesota Population Center
University of Minnesota
(o) 612-624-5818
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Le Wang, Ph.D.
Minnesota Population Center
University of Minnesota
(o) 612-624-5818
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/