Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Trying to do some multiple imputation

From   Nicola Orsini <>
Subject   Re: st: Trying to do some multiple imputation
Date   Tue, 24 Jan 2006 18:29:55 +0100


1) define a working directory -help cd-

cd "mypath"

2) use the command -use- instead of -sysuse- to open your dataset (help sysuse)

use das1995r

The rest of the lines should be fine. I hope you are using a do-file to run the analysis.


Mosi A. Ifatunji wrote:

Thanks Nicola,

I have been trying the code that you sent and I am having some trouble with
it. Admittedly, I am not familiar with some of the commands that you're
using and the -help- command doesn't really provide for much more clarity.
So, let me tell you what I am doing (verbatim) and you can tell me if there
is an error in my use of the syntax you have so generously provided.

First, the dataset I am using is called das1995r and it is located in the
main Stata folder. The variable I would like to generate values for is
'income.' the variables I would like to generate 'income' from are: 'black',
'male', 'age2' and 'educate.'

The commands that I am not familiar with have an * after them. I did not
actually put the * in my syntax, but I thought it might help for you to know
my level of novice :-). After looking at your sample, I tried the following:


sysuse* das1995r

forv* i = 1(1)5 {
uvis* regress income black male age2 educate, gen(income`i') seed(123695`i')
replace income = income`i'
save das`i', replace

[[Here I am told: "already preserved r(621);"]]

forv* i = 1(1)5 {
use das`i', clear
tab income, miss

[[Here is where I get the error message that stops the progress. I get a
message that says: "file das1.dta not found." I proceed nonetheless.]]

miset* using das

[[Here is where I figure out that I can go no further for real. I get the
error message: "file das1.dta not found."]]

I am assuming that I am making an error somewhere, but I just don't know
where. As you can see, I skipped the part where you created missing values,
because my values are already missing in the 'income' variable. Other then
that the syntax is the same as you provided it, I think.

Thank you for your time and energy,



Hi Rose,

this is an example of multiple imputation.


* findfile cancer.dta

sysuse cancer

set seed 1234

* create some missing at random for the outcome variable died (0/1)

gen u = uniform()
replace died = . if u > 0.6
codebook died

* impute 5 times the outcome variable died using uvis
* every time generating a new variable (died1, died2, ..., died5)

forv i = 1(1)5 {

* specify a model to predict missing values
uvis logistic died drug age, gen(died`i')  seed(123695`i')
replace died = died`i'

* save a new dataset with the imputed dataset (canc1, canc2, ..., canc5)
save canc`i', replace

* have a look at the imputed variables saved in the new datasets

forv i = 1(1)5 {
use canc`i', clear
tab died , miss

* set the imputed dataset before combine results
miset using canc

* specify the estimation command to be executed on each imputed dataset
* and get the overall estimates
mifit, indiv: logistic died drug age


*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index