[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: stata code for two-part model |

Date |
Mon, 18 Aug 2008 08:33:27 -0700 |

In some instances, the model for healthcare expenditures does have true zeros that are identifiable. In one study I consulted on the data came from a health insurer, and zeros were people who had not gone to hospital. The use of a continuous model for data in which there is a clump of zeros seems incorrect. There is no transformation that can remove this clump. The severity of the problem depends a bit on the size of the clump. In the hospital insurance data (wanting to estimate hospitalization costs in the policy holders) 95% of the population had no costs. Pretending that these were continuous would lead to some nonsense results. At the present time, I have a data set that has 32 out of 145 people with zeros. However, these are not necessarily identifiable since they could be slightly greater than zero. I'm gritting my teeth on this and pretending all is well. However, a histogram shows enormous skewness. I'll probably try a square root. Tony Peter A. Lachenbruch Department of Public Health Oregon State University Corvallis, OR 97330 Phone: 541-737-3832 FAX: 541-737-4001 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Austin Nichols Sent: Saturday, August 16, 2008 8:50 AM To: statalist@hsphsun2.harvard.edu Subject: Re: st: stata code for two-part model Shehzad Ali et al. -- See also http://www.nber.org/papers/t0228 The two part models of health expenditures have always struck me as a bad idea; think about how you would get predictions for each indiv in your sample. The "stage 1" probit classifies people as having expenditures or not (some correctly, some not) and then the "stage 2" ols model gives predicted expenditures only for those people who actually have positive expenditures (not those who are classified by the probit as likely to have positive expenditures) unless you predict out of sample. At least one preferred approach of calculating marginal effects by comparing predictions over the whole sample turns out to be practically and analytically difficult in that setting. However, a -glm- with a log link (or equivalently a -poisson- regression) has no trouble: those people with extremely low predicted expenditures would round to zero predicted expenditures if you thought about a survey with expenditures measured discretely in dollars, say. Everyone has E(y)=exp(Xb) and there is no real issue with calculating marginal effects. Once you are in the -glm- framework it is also easy to think about model fit and alternative links... On Sat, Aug 16, 2008 at 3:41 AM, Eva Poen <eva.poen@gmail.com> wrote: > Shehzad, > > this looks like a hurdle model. Have you search the ssc archives to > see if someone else has programmed it for you? Have a look at > -hplogit-, for example. > > If you end up doing it yourself, I think you need to do a bit of > programming. In order for -mfx- to work after your estimation, you > need a way of telling it what you want the marginal effects to be > calculated for. In your case, this would be the overall expected cost > of care from your model. The way to feed this to -mfx- is via the > predict(predict_option), but for this to work you need to write a > -predict- command and an estimation command for your model. > > See for example this post: > http://www.stata.com/statalist/archive/2005-10/msg00091.html > > Hope this helps, > Eva > > > > 2008/8/16 Shehzad Ali <sia500@york.ac.uk>: >> Hi, >> >> I was wondering if someone can help with stata code for calculating marginal >> effects after two-part models for say, cost of care. Here, first part is a >> probit model for seeking care or not, and the second part is an OLS model of >> cost of care, conditional on decision to seek care. Here is the simplified >> code: >> >> probit care $xvar >> >> reg cost $zvar if care==1 >> >> mfx >> >> I understand that mfx after the second part gives us the marginal effects >> for the OLS part only, and not the conditional marginal effects. >> >> Any help would be appreciated. >> >> Thanks, >> >> Shehzad * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: stata code for two-part model***From:*"Austin Nichols" <austinnichols@gmail.com>

**RE: st: stata code for two-part model***From:*Maarten buis <maartenbuis@yahoo.co.uk>

**References**:**st: stata code for two-part model***From:*"Shehzad Ali" <sia500@york.ac.uk>

**Re: st: stata code for two-part model***From:*"Eva Poen" <eva.poen@gmail.com>

**Re: st: stata code for two-part model***From:*"Austin Nichols" <austinnichols@gmail.com>

- Prev by Date:
**Re: st: Problem with ARIMA-ARCH model** - Next by Date:
**RE: st: RE: RE: Bootstrap and Technical analysis** - Previous by thread:
**Re: st: stata code for two-part model** - Next by thread:
**RE: st: stata code for two-part model** - Index(es):

© Copyright 1996–2019 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |