Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Trying to do some multiple imputation


From   Rose Medeiros <[email protected]>
To   [email protected]
Subject   Re: st: Trying to do some multiple imputation
Date   Tue, 24 Jan 2006 10:51:26 -0500

Mosi,
As you say, there are two main advantages to multiple imputation, and both involve correcting for the uncertainty of the imputations. The first advantage is in the parameter estimates, by essentially taking the mean of the parameter estimate from each of the m imputations (where m is the number of imputations). The second, advantage of multiple imputation is the correction of standard errors for within and between imputation variance, which allows for better estimates of confidence intervals and p-values. These advantages require that you run each of your models on all m imputed datasets, and then combine these estimates. This is what the -micombine- command does for you (after -mvis- is used to create the m imputations). Note that although -mvis- generates only one dataset, it is actually creating m imputations within that single dataset. Nicola's email provides an example of how to do this using Carlin et. al's "tools for analyzing multiple imputed datasets." Stata Journal 4-3 has an article, including example, on how to use Royston's package ice (the package the commands in your original email come from) to do the same.
The procedure you appear to be attempting is somewhat different. By imputing the m (in your case 5) datasets and using them to estimate the relationship between the predictor variables and income (which is what the syntax in your first email does), you could generate estimated values of income and use these estimates to replace the missing values of income in your original dataset. However, this procedure is not multiple imputation, and does not give you the advantages of multiple imputation.
HTH,
Rose

Mosi A. Ifatunji wrote:


Thanks Rose,

The thing is that I've done some reading on single and multiple imputation
and the literature seems to suggest that doing multiple imputation is much
better because your imputed values are informed by within and between
dataset variance. Therefore, although I am only trying to impute one value
(income), the -uvis- command does not seem appropriate, given that it
generates the new values without the benefit of between dataset variance,
because it does not generate new datasets. (Question: What is the real
difference between -impute- and -uvis-?).

So, I think I am trying to impute missing values for one variable, but I
would like to generate the multiple datasets from which to generate the new
values. If my variables were y x1 x2 x3 (with y being the variable with
missing values that I am trying to generate new values for) could you send
me and example of how I might do such a thing, from generating the multiple
datasets to getting the missing values imputed in the original dataset?

Any help would be wonderful,

M.


On 1/24/06 8:33 AM, "Rose Medeiros" <[email protected]> wrote:



Mosi,
If your goal is just to impute values of your income variable, you might
use -uvis- which will impute values of the yvar and leave them in your
initial dataset. If this is problematic because of a large number of
missing values in the variables you are attempting to impute income
from, you could use -mvis- and generate only one imputation by
specifying m(1) and run your analyses on this dataset (which would also
have imputed values for the other variables). Note that both of these
procedures are single imputation, rather than multiple imputation. If
you actually want to do multiple imputation, you would want to use
-micombine- to specify the actual models you want to test, not the
variables you are trying to impute.
Best,
Rose

Mosi A. Ifatunji wrote:



Good people,

Here is my quandary. I am having a heck of a time trying to complete
procedures for multiple imputation using Stata 8.2.

My goal is to impute missing values for my income variable (v1019). I would
like to generate 5 new and complete datasets from which to derive my new
values (to be placed back into the old dataset). Here is the syntax I have
been using to no avail:

First, I use the MVIS command to generate five new datasets with values for
any missing values in the key variables:

mvis v1019 black male age2 educate using imp, m(5) genmiss(m_) cmd(regress)
cy(20) se(101) replace

And I get...

imputing 1..2..3..4..5..file imp.dta saved

Then, I open the new dataset (with all missing values imputed):

use imp, clear

Then I generate a model (from the 5 new datasets in imp.dta) that predicts
my income variable (v1019):

micombine regress v1019 black male age2 educate

Multiple imputation parameter estimates (5 imputations)
----------------------------------------------------------------------------
v1019 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+-------------------------------------------------------------
black | -16060.96 2412.319 -6.66 0.000 -20794.22 -11327.7
male | 10331.52 2487.338 4.15 0.000 5451.057 15211.97
age2 | -114.4879 90.15501 -1.27 0.204 -291.3829 62.40716
educate | 4532.232 722.751 6.27 0.000 3114.107 5950.357
_cons | -8530.671 12416.8 -0.69 0.492 -32893.94 15832.6
----------------------------------------------------------------------------
1106 observations.

Now what do I do? I have been roaming through manuals and copies of the
Stata Journal (4-3 and 5-4) but every time to get near, the author(s) leave
out something important, like how exactly do I use MISET, MI SPLIT AND
MIJOIN to get my imputed values back to my original dataset....

Any help would be greatly appreciated...

M.


*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/





*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/



--
Rose Anne Medeiros
Department of Sociology / Family Research Laboratory
University of New Hampshire
126 Horton Social Science Center
20 College Road
Durham, NH 03824
U.S.A.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index