Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: mi ice with categorical vars and family data?

From   Laurel Lunn <>
Subject   st: mi ice with categorical vars and family data?
Date   Mon, 2 Aug 2010 09:00:39 -0500


I have an issue re: multiple imputation with ice, and was wondering if
anyone could assist.

In this dataset (n = 564), each case represents a family. The mother
reported on herself, as well as one child in each of four age groups
(adolescent, child, young child, infant). Obviously some women did not
have a child who fell into every group; thus, there are over 500
mothers/cases, but there are only about 130-210 children in each age
group. Hence, some of the data are missing in the conventional sense
(the mother didn't report it, etc.), and some are missing because no
child exists.

We are investigating the impact of an intervention on over 30 total
outcomes (between 5 & 10 for each age group), several covariates, and
one major predictor. Many of these variables are categorical or

Ideally, we would include all variables in the imputation and impute
the entire dataset; then perform the analyses.

Because of the structure of the data, I have been restricting the
imputation to cells for which we know a child to exist (so as not to
impute values on imaginary children) by using the -conditional( )-
option and specifying, for example, that adolescent variables only be
imputed for cases in which the adolescent indicator == 1. However, ice
gives me numerous errors (some having to do with lack of convergence
or failing to update particular variables in certain cycles), and
additionally drops many variables because of collinearity. If I run
the ice command without the –conditional- option, such that all cases
are imputed for all variables, regardless of the actual presence of a
child, then it runs without much of a problem. I’ve also tried running
the imputation by specifying a separate equation for each variable
using the –eq( )- option, and this gives fewer errors, but does not
result in a viable dataset for analyses.

(1)   Is there something obvious that I am missing?

(2)   If not, could I go ahead and impute the entire dataset as though
a child existed in every family for every age group, but then, after
the imputation, drop the “fake” children from the dataset before
running analyses?

(3)   Alternatively, what are the implications if I split my larger
dataset into several smaller ones (i.e., one for each age group) and
impute each of these datasets separately? I would not need to merge
them later for analyses; they could be done within each dataset.
However, I would have to include the mother in each of the datasets,
since some of the mother-related variables would need to be used as
covariates in the children’s regressions. Hence, I’d have five
datasets (mother, adolescent, child, etc.), each with slightly
different values for the mother’s variables.

Thanks very much for your consideration.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index