Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Re: Trying to do some multiple imputation

From   "Mosi A. Ifatunji" <>
To   "Statalist@HARVARD.EDU" <>
Subject   Re: st: Re: Trying to do some multiple imputation
Date   Tue, 24 Jan 2006 13:29:53 -0600


I'm not exactly sure if this accomplishes what I am interested in
accomplishing. I am sure that you're telling me the right answer, but I may
not be asking the right question, LOL (i.e., I may not be communicating the
question correctly). Let me give it another go:

So, to start at square one; I have a dataset named das1995r. In that dataset
there is an income variable, 'income.' There are a lot of values missing for
'income.' I am assuming that these values are missing at random (MAR) and
can be predicted by the respondents race ('black'), sex ('male'), age
('age2') and level of education ('educate').

I would like to end up with a variable in my original dataset (das1995r)
with a new variable for income. That is, the old variable will still be
there ('income') but there would also be a new variable at the end of the
dataset (das1995r) that represents old values for 'income' with new values
for 'income' where there were once missing values. I would like this
variable to be called, 'imp_income.'

In order to accomplish this, I have been given the following syntax:



cd "/Users/Ifatunji/Documents/IFATUNJI
Docs/University/Academic/Graduate/Data/DAS 1995/"

use das1995r

forv i = 1(1)5 {
uvis regress income black male age2 educate, gen(income`i') seed(123695`i')
replace income = income`i'
save das`i', replace

/* forv i = 1(1)5 {
use das`i', clear
tab income, miss
} */

miset using das

mifit, indiv: regress income black male age2 educate


The problem is that when this syntax is finished running, I find myself in a
new dataset (i.e., not das1995r, but some other dataset) that seems to
represent several datasets in one. In this new dataset, the 'income'
variable has almost no missing values (which is fine).

My problem is that I cannot get a representation of this new 'income'
variable with no missing values from this new dataset, back to the original
dataset, das1995r. Ultimately, when this new 'income' variable gets back to
the original dataset, I would like it to have one observation per case and
be called 'imp_income.'

Any thoughts you might have would be very welcome,


On 1/24/06 12:17 PM, "Nick Cox" <> wrote:

> I forward below some comments from Patrick Royston
> who wrote what is now -mice- (but is not a member
> of Statalist). They may be redundant given
> other postings. I am not familiar with the details
> of -mice- and cannot advise myself.
> Nick 
> --------------------------------------------
> The commands -misplit- and -mijoin- have a specialised purpose and
> are not often used.
> If you wish to combine the imputed dataset with the original data you
> only need load the imputed dataset and append the original one.
> For the example given by Mosi,
> use imp, clear
> append using <original data file name>
> Note that with this approach, the imputation indicator _j and the observation
> indicator _i will be missing for the original dataset but of course present
> for the imputed dataset.
> Then you can do what you wish with the combined dataset. -micombine- will
> still work correctly, it will just ignore observations in which _j is missing.
> Preferably, you should now be using -ice-, not -mvis- which is out of date.
> See the Stata Journal 5(4) update.
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index