Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Sample selection correction models with fixed-effects

From   Ivy Kodzi <[email protected]>
To   [email protected]
Subject   st: Sample selection correction models with fixed-effects
Date   Mon, 31 Jan 2011 10:18:10 -0500

Hi Statalisters,

My name is Ivy Kodzi. I am new to the list and find it very helpful!

I am doing some analysis where I want to estimate the effect of parity
(how many  births a woman has had) on the chance of having a child
with low birth weight.  I also want to adjust for all constant
observed and unobserved factors at the mother  level e.g. some aspects
of her health/genetics, household environment, community etc., so  I
am running a fixed-effects regression model with a dummy for whether
or not a child was born with low birth weight as the outcome variable.
In my data file, I have child characteristics nested under "mother".
The birth weight variable, however, (from DHS data-sets) has a serious
sample selection problem – the data predominantly come from more
educated, richer, urban women who had their babies in hospitals. In my
child level file, about two-thirds of children are missing birth
weights  largely because their mothers (less affluent) had the birth
at home - hence no birth weight recorded. I could adjust for the
selection problem with a heckman model e.g. with a  bivariate probit
model, with selection  but fundamentally, I want to conduct a
within-mother analysis so I want to include fixed-effects for the
mother. The commands heckprob and heckprob2 in stata allow for
adjusting for the clustering of children within mother, but not the
inclusion of fixed effects.  I have nearly a 100 thousand mothers, so
including dummies for mother-specific fixed-effects will  not be
efficient, and may be incorrect.

I am also not very sure if I should approach the problem as a  Heckman
type sample selection problem or as a non-ignorable missing data
problem (even with so much missingness?).  I think the problem is more
of the former, that is why I am considering  a Heckman correction
model. Can anyone recommend how to do the estimation, especially  in

Thank you so much!

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index