Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Regressing with variables with missing values

From   "Garrard, Wendy M." <wendy.garrard@Vanderbilt.Edu>
To   <>
Subject   RE: st: Regressing with variables with missing values
Date   Wed, 2 Nov 2005 15:45:34 -0600

The MAR assumption is pretty robust to some violations. The main issue
for MAR is whether you have some observed covariates that provide
information about the missing values.  For example, if household income
is missing, then other variables, if observed, may provide some basis
for (e.g., zip code, occupation, education level) plausible estimation.

If you have some good covariates you may be able to construct a
relatively simple regression model to come up with some plausible
estimates of the missing values.  Note -- if you have good covariates
multiple imputation is also an option.  If you don't have observed
covariate information, and the missing data is non-random (MNAR), then
more specialized (and probably complex) models are required for handling
the missing data.

If you can justify MAR, the -impute- command may help you, although the
multiple imputation algorithms are more cutting edge these days.


-----Original Message-----
[] On Behalf Of Ramani
Sent: Wednesday, November 02, 2005 2:50 PM
Subject: Re: st: Regressing with variables with missing values

Thanks, Paul. I did download listmiss and use it. Now my dilemma is that
the main culprits appear non-random wrt the dependent variable according
to listmiss (ie. t and p values appear in yellow with stars). That means
that I can't use ice because that assumes that the missing observations
are missing at random. I'd be grateful for any suggestions as to what I
should do next.

On 03/11/05, Paul Millar <> wrote:
> You might also use the post-estimation command - listmiss - to find 
> which variables are the main culprits and which ones have missing 
> values that are non-random wrt the dependent variable.
> ssc install listmiss
> - Paul Millar
> At 09:18 AM 02/11/2005, you wrote:
> >At 10:52 AM 11/2/2005, Ramani Gunatilaka wrote:
> >>Dear Statalist,
> >>This may seem a stupid question for the statisticians among you but 
> >>I'd appreciate some help.
> >>I want to run a regression on cross-section data with lots of 
> >>variables, some of which have missing values. When I do that, Stata 
> >>estimates the model using only the observations which have values 
> >>for all variables. I downloaded tabmiss and rmiss2 as in the relvant

> >>FAQ and the commands would certainly help in enabling me to decide 
> >>which variables to drop. But is there any way that I could retain 
> >>all the variables with their missing values and make allowance for 
> >>the missing values by including a dummy for missing variables?
> >
> >The way you retain the missing values is by recoding them to a 
> >non-missing value, e.g. the variable's mean.  This has all sorts of 
> >problems though.  The MD dummy variable indicator that you propose 
> >used to be popular but has since been discredited.  See Paul 
> >Allison's Sage book "Missing Data."
> >
> >For a synopsis of basic strategies and their pros and cons, see
> >
> >
> >
> >That handout is weak in discussing more advanced methods, although it

> >does allude to them.  You might check out Royston's -ice- package, 
> >which was recently updated and discussed in the Stata Journal.  Use
> >
> >-findit ice-
> >
> >
> >-------------------------------------------
> >Richard Williams, Notre Dame Dept of Sociology
> >OFFICE: (574)631-6668, (574)631-6463
> >FAX:    (574)288-4373
> >HOME:   (574)289-5227
> >EMAIL:  Richard.A.Williams.5@ND.Edu
> >WWW (personal):
> >WWW (department):
> >*
> >*   For searches and help try:
> >*
> >*
> >*
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index