Hi Rose,
this is an example of multiple imputation.
clear
* findfile cancer.dta
sysuse cancer
set seed 1234
* create some missing at random for the outcome variable died (0/1)
gen u = uniform()
replace died = . if u > 0.6
codebook died
* impute 5 times the outcome variable died using uvis
* every time generating a new variable (died1, died2, ..., died5)
forv i = 1(1)5 {
preserve
* specify a model to predict missing values
uvis logistic died drug age, gen(died`i') seed(123695`i')
replace died = died`i'
* save a new dataset with the imputed dataset (canc1, canc2, ..., canc5)
save canc`i', replace
restore
}
* have a look at the imputed variables saved in the new datasets
forv i = 1(1)5 {
use canc`i', clear
tab died , miss
}
* set the imputed dataset before combine results
miset using canc
* specify the estimation command to be executed on each imputed dataset
* and get the overall estimates
mifit, indiv: logistic died drug age
Best,
Nicola
Mosi A. Ifatunji wrote:
Thanks Rose,
The thing is that I've done some reading on single and multiple imputation
and the literature seems to suggest that doing multiple imputation is much
better because your imputed values are informed by within and between
dataset variance. Therefore, although I am only trying to impute one value
(income), the -uvis- command does not seem appropriate, given that it
generates the new values without the benefit of between dataset variance,
because it does not generate new datasets. (Question: What is the real
difference between -impute- and -uvis-?).
So, I think I am trying to impute missing values for one variable, but I
would like to generate the multiple datasets from which to generate the new
values. If my variables were y x1 x2 x3 (with y being the variable with
missing values that I am trying to generate new values for) could you send
me and example of how I might do such a thing, from generating the multiple
datasets to getting the missing values imputed in the original dataset?
Any help would be wonderful,
M.
On 1/24/06 8:33 AM, "Rose Medeiros" <[email protected]> wrote:
Mosi,
If your goal is just to impute values of your income variable, you might
use -uvis- which will impute values of the yvar and leave them in your
initial dataset. If this is problematic because of a large number of
missing values in the variables you are attempting to impute income
from, you could use -mvis- and generate only one imputation by
specifying m(1) and run your analyses on this dataset (which would also
have imputed values for the other variables). Note that both of these
procedures are single imputation, rather than multiple imputation. If
you actually want to do multiple imputation, you would want to use
-micombine- to specify the actual models you want to test, not the
variables you are trying to impute.
Best,
Rose
Mosi A. Ifatunji wrote:
Good people,
Here is my quandary. I am having a heck of a time trying to complete
procedures for multiple imputation using Stata 8.2.
My goal is to impute missing values for my income variable (v1019). I would
like to generate 5 new and complete datasets from which to derive my new
values (to be placed back into the old dataset). Here is the syntax I have
been using to no avail:
First, I use the MVIS command to generate five new datasets with values for
any missing values in the key variables:
mvis v1019 black male age2 educate using imp, m(5) genmiss(m_) cmd(regress)
cy(20) se(101) replace
And I get...
imputing 1..2..3..4..5..file imp.dta saved
Then, I open the new dataset (with all missing values imputed):
use imp, clear
Then I generate a model (from the 5 new datasets in imp.dta) that predicts
my income variable (v1019):
micombine regress v1019 black male age2 educate
Multiple imputation parameter estimates (5 imputations)
----------------------------------------------------------------------------
v1019 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+-------------------------------------------------------------
black | -16060.96 2412.319 -6.66 0.000 -20794.22 -11327.7
male | 10331.52 2487.338 4.15 0.000 5451.057 15211.97
age2 | -114.4879 90.15501 -1.27 0.204 -291.3829 62.40716
educate | 4532.232 722.751 6.27 0.000 3114.107 5950.357
_cons | -8530.671 12416.8 -0.69 0.492 -32893.94 15832.6
----------------------------------------------------------------------------
1106 observations.
Now what do I do? I have been roaming through manuals and copies of the
Stata Journal (4-3 and 5-4) but every time to get near, the author(s) leave
out something important, like how exactly do I use MISET, MI SPLIT AND
MIJOIN to get my imputed values back to my original dataset....
Any help would be greatly appreciated...
M.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/