[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: using treatment and selection models for my data

From   "Austin Nichols" <>
Subject   Re: st: using treatment and selection models for my data
Date   Mon, 12 Nov 2007 11:42:00 -0500

Shehzad Ali <>:
You still haven't specified a causal model, or the nature of the bias.
If you can be precise about these, you might be led naturally to a
particular model. I, for one, don't believe that 'level of worry about
future health' and 'dummy variable for membership in mass
organisation/s' would satisfy an exclusion restriction for use as
excluded instruments in -ivreg- or similar models, though.

One plausible source of unobserved heterogeneity is variation in risk
tolerance, which might lead some people to buy insurance and others
not, even when all face the same price and risk.  This would also lead
to different levels of worry about future health, and possibly
membership in organizations that provide some forms of social
insurance.  Note also that seeking medical advice is like buying
insurance--you see the doctor, and if you're sick in way that would
get worse without medical intervention you're better off (the bad
state of the world where insurance pays off), and if you're sick in
way that would get better without medical intervention you're no worse
off, except that you have to pay the doctor (the premium for this

Besides variation in risk preferences, there is likely unobserved
variation in time preferences (discount rates), and unobserved
variation in price and income elasticities of demand related to the
relative weight placed on health versus other consumption, unobserved
variation in a host of underlying risks related to genetic
predispositions or environmental exposures, and unobserved variation
in economic and other endowments, not to mention measurement error.
Simply running -ivreg- or -treatreg- or -heckman- blindly, without a
good argument about what the problems are and how you're fixing them,
is unlikely to give a good result.

I also note you're predicting Xb and naming it "imr" which presumably
does not stand for "Inverse Mills' Ratio" since that is f(Xb)/F(Xb).

webuse womenwk, clear
local m wage educ age
local s married children age
qui heckman `m', select(`s') twostep mills(twostep)
est sto heckman
g seen = (wage < .)
qui probit seen `s'
predict double xb, xb
g double imr=normalden(xb)/normal(xb)
qui reg `m' imr
est sto byhand
su imr twostep
est table *,  eq(1) se

It's not clear what role those "imr" variables are supposed to play in
your analysis.

If I were you, I'd start by regressing med_exp on insurance and x*
using -poisson- (and saw_doctor on insurance and x* using -logit-).
Probably some folks who have higher "latent med_exp" want more
insurance, but they may also face a higher cost of, or other barriers
to, buying insurance.  It's not clear what the nature of that bias is.

What is the quantity of interest here?  Do you want to know what the
effect on aggregate med exp would be if insurance were expanded to a
few uninsured people?

You'll probably get more bang for your buck exploring the correct
functional form using -graph- and -lpoly- within categories of X than
by jumping to a particular parametric specification using -treatreg-
or -heckman-.

All of these questions are probably outside the scope of help you can
reasonably expect from Statalist...

On Nov 11, 2007 10:32 PM, Shehzad Ali <> wrote:
> Dear Austin,
> Thank you for getting back. I should have provided a little bit of
> background. Data is from random cross-sectional survey of insured and
> uninsured households in Vietnam. Households were sampled randomly, though
> insured were over-sampled to have a decent number of them.
> My main outcome variable of interest is medical expenditure at health
> centres or hospitals. Therefore those who didn't seek medical advice have
> zero actual expenditure, though latent expenditure may be different. I wish
> to determine the impact of insurance on medical expenditure. Insurance
> membership is voluntary and my assumption is that the sick are more likely
> to buy insurance. Therefore I would like to adjust for this bias
> (endogeneity). Also because not all sick seek medical advice and therefore
> have medical expenditure, I would like to adjust for this sample selection
> as well which is attributed to the decision to seek care given illness. The
> independent variables used in med expenditure equation are age, sex, rural,
> province, income, job dummies, years of schooling, self-assessed health
> status and severity of illness. Instrumental variables for insurance
> equation are 'level of worry about future health' and 'dummy variable for
> membership in mass organisation/s'. I don't have an instrumental variable
> for 'decision to seek med advice' variable.
> Hope you can help more after this description.
> Many thanks,
> Shehzad
> -----Original Message-----
> From:
> [] On Behalf Of Austin Nichols
> Sent: 12 November 2007 03:08
> To:
> Subject: Re: st: using treatment and selection models for my data
> Shehzad Ali <>:
> Whether those models are appropriate depends on the causal model you
> propose that connects these constructs (you do not even mention what
> the x variables are), and how you think the data are generated.  What
> is the source of bias in regressing medical expenditures on insurance?
>  Are the errors normally distributed? Etc. etc.
> Are we to understand that medical expenditure is zero when someone
> does not seek medical advice? In that case, modeling expected med exp
> as exp(Xb) may make sense, using e.g. ivpois, in which case you might
> predict med exp of 0.03 for those folks who have zero expenditure.  If
> the desired med exp is low, then it is no surprise that these folks
> don't seek medical advice.
> I think you probably have a complicated story in mind, where people
> have unobserved heterogeneity in preferences (discount rates, risk
> prefs, price elasticity etc.) and are induced (possibly in unforeseen
> ways) by past choices to optimize at corner solutions, but it is
> impossible to comment on your use of the empirical models without
> seeing a cogent theory to motivate them.
> On Nov 11, 2007 2:14 PM, Shehzad Ali <> wrote:
> > Hello listers,
> >
> > I have a query about using biprobit, treatreg and heckman steps in stata
> and
> > generating IMRs. I am using a three-part model for medical expenditure.
> > Here is a summary:
> >
> > I have 1,500 in the sample who felt sick, of which 1,000 sought medical
> > advice and hence had medical expenditure. Here the first selection bias is
> > with
> > regards to seeking treatment when ill. Then in the sample, 700 individuals
> > are insured and 1,300 are not. This is the second selection bias which is
> > related to insurance purchasing decision. So I need to take into account
> two
> > selection equations for my medical expenditure (outcome) equation, i.e.
> > sought medical advice when ill, and bought insurance. I have to bear in
> mind
> > that insurance decision also affects decision to seeking care when ill and
> > medical expenditure when treatment is sought.
> >
> > Here is what I am thinking of doing:
> >
> > First part of the model:
> >
> > biprobit (eq1: visit_when_ill = insurance x1 x2 x3) (eq 2: insurance=x1 x2
> > x3 x4)
> >
> > Here first eq is for decision to seek care when ill and second decision is
> > to buy insurance.
> >
> > Predict imr1, xb
> >
> > Second part:
> >
> > heckman med_expenditure x1 x2 x3 imr1 insurance, treat(insurance= x1 x2 x3
> > x4)
> >
> > predict imr2,xb
> >
> > Third part:
> >
> > treatreg med_expenditure x1 x2 x3 insurance imr2, treat(insurance=x1 x2 x3
> > x4)
> >
> > Is this the right approach to take? Any help would be greatly appreciated.
> >
> > With sincere thanks,
> >
> > Shehzad
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index