Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Laurel Lunn <laurel.lunn@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: mi ice with categorical vars and family data? |

Date |
Mon, 2 Aug 2010 09:00:39 -0500 |

Hello, I have an issue re: multiple imputation with ice, and was wondering if anyone could assist. In this dataset (n = 564), each case represents a family. The mother reported on herself, as well as one child in each of four age groups (adolescent, child, young child, infant). Obviously some women did not have a child who fell into every group; thus, there are over 500 mothers/cases, but there are only about 130-210 children in each age group. Hence, some of the data are missing in the conventional sense (the mother didn't report it, etc.), and some are missing because no child exists. We are investigating the impact of an intervention on over 30 total outcomes (between 5 & 10 for each age group), several covariates, and one major predictor. Many of these variables are categorical or dichotomous. Ideally, we would include all variables in the imputation and impute the entire dataset; then perform the analyses. Because of the structure of the data, I have been restricting the imputation to cells for which we know a child to exist (so as not to impute values on imaginary children) by using the -conditional( )- option and specifying, for example, that adolescent variables only be imputed for cases in which the adolescent indicator == 1. However, ice gives me numerous errors (some having to do with lack of convergence or failing to update particular variables in certain cycles), and additionally drops many variables because of collinearity. If I run the ice command without the –conditional- option, such that all cases are imputed for all variables, regardless of the actual presence of a child, then it runs without much of a problem. I’ve also tried running the imputation by specifying a separate equation for each variable using the –eq( )- option, and this gives fewer errors, but does not result in a viable dataset for analyses. (1) Is there something obvious that I am missing? (2) If not, could I go ahead and impute the entire dataset as though a child existed in every family for every age group, but then, after the imputation, drop the “fake” children from the dataset before running analyses? (3) Alternatively, what are the implications if I split my larger dataset into several smaller ones (i.e., one for each age group) and impute each of these datasets separately? I would not need to merge them later for analyses; they could be done within each dataset. However, I would have to include the mother in each of the datasets, since some of the mother-related variables would need to be used as covariates in the children’s regressions. Hence, I’d have five datasets (mother, adolescent, child, etc.), each with slightly different values for the mother’s variables. Thanks very much for your consideration. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**AW: st: Spss vs Stata** - Next by Date:
**Re: st: RE: AW: AW: RE: AW: RE: Transposing datasets** - Previous by thread:
**st: joint effect of two endogenous variables in ivprobit** - Next by thread:
**st: export list to .txt or excel file** - Index(es):