Mosi,
As you say, there are two main advantages to multiple imputation, and
both involve correcting for the uncertainty of the imputations. The
first advantage is in the parameter estimates, by essentially taking the
mean of the parameter estimate from each of the m imputations (where m
is the number of imputations). The second, advantage of multiple
imputation is the correction of standard errors for within and between
imputation variance, which allows for better estimates of confidence
intervals and p-values. These advantages require that you run each of
your models on all m imputed datasets, and then combine these estimates.
This is what the -micombine- command does for you (after -mvis- is used
to create the m imputations). Note that although -mvis- generates only
one dataset, it is actually creating m imputations within that single
dataset. Nicola's email provides an example of how to do this using
Carlin et. al's "tools for analyzing multiple imputed datasets." Stata
Journal 4-3 has an article, including example, on how to use Royston's
package ice (the package the commands in your original email come from)
to do the same.
The procedure you appear to be attempting is somewhat different. By
imputing the m (in your case 5) datasets and using them to estimate the
relationship between the predictor variables and income (which is what
the syntax in your first email does), you could generate estimated
values of income and use these estimates to replace the missing values
of income in your original dataset. However, this procedure is not
multiple imputation, and does not give you the advantages of multiple
imputation.
HTH,
Rose
Mosi A. Ifatunji wrote:
Thanks Rose,
The thing is that I've done some reading on single and multiple imputation
and the literature seems to suggest that doing multiple imputation is much
better because your imputed values are informed by within and between
dataset variance. Therefore, although I am only trying to impute one value
(income), the -uvis- command does not seem appropriate, given that it
generates the new values without the benefit of between dataset variance,
because it does not generate new datasets. (Question: What is the real
difference between -impute- and -uvis-?).
So, I think I am trying to impute missing values for one variable, but I
would like to generate the multiple datasets from which to generate the new
values. If my variables were y x1 x2 x3 (with y being the variable with
missing values that I am trying to generate new values for) could you send
me and example of how I might do such a thing, from generating the multiple
datasets to getting the missing values imputed in the original dataset?
Any help would be wonderful,
M.
On 1/24/06 8:33 AM, "Rose Medeiros" <[email protected]> wrote:
Mosi,
If your goal is just to impute values of your income variable, you might
use -uvis- which will impute values of the yvar and leave them in your
initial dataset. If this is problematic because of a large number of
missing values in the variables you are attempting to impute income
from, you could use -mvis- and generate only one imputation by
specifying m(1) and run your analyses on this dataset (which would also
have imputed values for the other variables). Note that both of these
procedures are single imputation, rather than multiple imputation. If
you actually want to do multiple imputation, you would want to use
-micombine- to specify the actual models you want to test, not the
variables you are trying to impute.
Best,
Rose
Mosi A. Ifatunji wrote:
Good people,
Here is my quandary. I am having a heck of a time trying to complete
procedures for multiple imputation using Stata 8.2.
My goal is to impute missing values for my income variable (v1019). I would
like to generate 5 new and complete datasets from which to derive my new
values (to be placed back into the old dataset). Here is the syntax I have
been using to no avail:
First, I use the MVIS command to generate five new datasets with values for
any missing values in the key variables:
mvis v1019 black male age2 educate using imp, m(5) genmiss(m_) cmd(regress)
cy(20) se(101) replace
And I get...
imputing 1..2..3..4..5..file imp.dta saved
Then, I open the new dataset (with all missing values imputed):
use imp, clear
Then I generate a model (from the 5 new datasets in imp.dta) that predicts
my income variable (v1019):
micombine regress v1019 black male age2 educate
Multiple imputation parameter estimates (5 imputations)
----------------------------------------------------------------------------
v1019 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+-------------------------------------------------------------
black | -16060.96 2412.319 -6.66 0.000 -20794.22 -11327.7
male | 10331.52 2487.338 4.15 0.000 5451.057 15211.97
age2 | -114.4879 90.15501 -1.27 0.204 -291.3829 62.40716
educate | 4532.232 722.751 6.27 0.000 3114.107 5950.357
_cons | -8530.671 12416.8 -0.69 0.492 -32893.94 15832.6
----------------------------------------------------------------------------
1106 observations.
Now what do I do? I have been roaming through manuals and copies of the
Stata Journal (4-3 and 5-4) but every time to get near, the author(s) leave
out something important, like how exactly do I use MISET, MI SPLIT AND
MIJOIN to get my imputed values back to my original dataset....
Any help would be greatly appreciated...
M.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
--
Rose Anne Medeiros
Department of Sociology / Family Research Laboratory
University of New Hampshire
126 Horton Social Science Center
20 College Road
Durham, NH 03824
U.S.A.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/