Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Trying to do some multiple imputation

From   Nicola Orsini <>
Subject   Re: st: Trying to do some multiple imputation
Date   Tue, 24 Jan 2006 16:26:19 +0100

Hi Rose,

this is an example of multiple imputation.


* findfile cancer.dta

sysuse cancer

set seed 1234

* create some missing at random for the outcome variable died (0/1)

gen u = uniform()
replace died = . if u > 0.6
codebook died

* impute 5 times the outcome variable died using uvis
* every time generating a new variable (died1, died2, ..., died5)

forv i = 1(1)5 {

* specify a model to predict missing values
uvis logistic died drug age, gen(died`i')  seed(123695`i')
replace died = died`i'

* save a new dataset with the imputed dataset (canc1, canc2, ..., canc5)
save canc`i', replace

* have a look at the imputed variables saved in the new datasets

forv i = 1(1)5 {
use canc`i', clear
tab died , miss

* set the imputed dataset before combine results
miset using canc

* specify the estimation command to be executed on each imputed dataset
* and get the overall estimates
mifit, indiv: logistic died drug age


Mosi A. Ifatunji wrote:
Thanks Rose,

The thing is that I've done some reading on single and multiple imputation
and the literature seems to suggest that doing multiple imputation is much
better because your imputed values are informed by within and between
dataset variance. Therefore, although I am only trying to impute one value
(income), the -uvis- command does not seem appropriate, given that it
generates the new values without the benefit of between dataset variance,
because it does not generate new datasets. (Question: What is the real
difference between -impute- and -uvis-?).

So, I think I am trying to impute missing values for one variable, but I
would like to generate the multiple datasets from which to generate the new
values. If my variables were y x1 x2 x3 (with y being the variable with
missing values that I am trying to generate new values for) could you send
me and example of how I might do such a thing, from generating the multiple
datasets to getting the missing values imputed in the original dataset?

Any help would be wonderful,


On 1/24/06 8:33 AM, "Rose Medeiros" <> wrote:

If your goal is just to impute values of your income variable, you might
use -uvis- which will impute values of the yvar and leave them in your
initial dataset. If this is problematic because of a large number of
missing values in the variables you are attempting to impute income
from, you could use -mvis- and generate only one imputation by
specifying m(1) and run your analyses on this dataset (which would also
have imputed values for the other variables). Note that both of these
procedures are single imputation, rather than multiple imputation. If
you actually want to do multiple imputation, you would want to use
-micombine- to specify the actual models you want to test, not the
variables you are trying to impute.

Mosi A. Ifatunji wrote:

Good people,

Here is my quandary. I am having a heck of a time trying to complete
procedures for multiple imputation using Stata 8.2.

My goal is to impute missing values for my income variable (v1019). I would
like to generate 5 new and complete datasets from which to derive my new
values (to be placed back into the old dataset). Here is the syntax I have
been using to no avail:

First, I use the MVIS command to generate five new datasets with values for
any missing values in the key variables:

mvis v1019 black male age2 educate using imp, m(5) genmiss(m_) cmd(regress)
cy(20) se(101) replace

And I get...

imputing 1..2..3..4..5..file imp.dta saved

Then, I open the new dataset (with all missing values imputed):

use imp, clear

Then I generate a model (from the 5 new datasets in imp.dta) that predicts
my income variable (v1019):

micombine regress v1019 black male age2 educate

Multiple imputation parameter estimates (5 imputations)
 v1019 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
 black |  -16060.96   2412.319    -6.66   0.000    -20794.22    -11327.7
  male |   10331.52   2487.338     4.15   0.000     5451.057    15211.97
  age2 |  -114.4879   90.15501    -1.27   0.204    -291.3829    62.40716
educate |   4532.232    722.751     6.27   0.000     3114.107    5950.357
 _cons |  -8530.671    12416.8    -0.69   0.492    -32893.94     15832.6
1106 observations.

Now what do I do? I have been roaming through manuals and copies of the
Stata Journal (4-3 and 5-4) but every time to get near, the author(s) leave
out something important, like how exactly do I use MISET, MI SPLIT AND
MIJOIN to get my imputed values back to my original dataset....

Any help would be greatly appreciated...


*   For searches and help try:

*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index