[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Help with ice for imputing a categorical variable

From   Maarten buis <>
Subject   Re: st: Help with ice for imputing a categorical variable
Date   Mon, 29 Jun 2009 12:25:11 +0000 (GMT)

--- On Sun, 28/6/09, Dana March wrote:
> About my data:
> 300 or so observations from a case-control study (n=544)
> The variable of interest is a covariate in my
> analysis.  It has 4 categories, coded 0-4, with 0 as the
> reference category.  There are  74 missing values.
> I would like to impute categorical values for this
> variable.
> I have complete data for other continuous covariates I can
> use for the imputation.
> I have consulted ice help and Royston's 2005 paper in the
> Stata Journal (vol 2, pp.1-14), scoured the net, and remain
> unclear about a  number of issues.
> For example:
> It is unclear to me whether I should use the ice command or
> the uvis command.

You should use -ice-.

> It is also unclear to me whether I should specify the
> variable as a categorical variable using xi: and i. 
> (alternatively, as m., o.)

If you have only one variable with missing values than
you can just add the categorical variable as is, whithout
doing anything to it (though,  haveing only one variable 
with missing values is a strong indication that somebody 
else seriously messed up your data, because all variables 
usually have some missing values.)

If you have multiple variables with missing values that 
you should use the -passive()- and -substitute()- 
options as discussed in section 4.2 of the article you 
referenced, and the example below.

*------------ begin example ----------------------
sysuse auto, clear
replace foreign = . if runiform() < .1
recode rep78 1/2=3
gen byte rep4 = rep78 == 4
gen byte rep5 = rep78 == 5
ice  rep78 rep4 rep5 foreign price weight, m(5) ///
    passive(rep4:rep78==4\rep5:rep78==5)        ///
    substitute(rep78: rep4 rep5) clear
*------------ end example --------------------------

Notice that your imputation model should be at least as
flexible as your model of interest, so ALL variables 
that are in your model of substantive interest must be
in your -ice- command (including the dependent variable).
So do not be tempted to limit your imputation model so
that it only contains one variable with missing 

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

© Copyright 1996–2023 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index