[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Restricted Ranges in Stata's Multiple Imputation Procedures

From   "Hoogendoorn, Adriaan" <>
To   "" <>
Subject   st: Restricted Ranges in Stata's Multiple Imputation Procedures
Date   Mon, 19 Oct 2009 17:24:25 +0200

Dear Statalist,
I use Stata’s multiple imputation procedures (I have tried both ice and mi impute) to deal with missing data and I am very satisfied with their options and their documentation. Yet, I have an issue that I'd like to bring up here. Let me first introduce my problem.
The data that I use contain 28 cases and (in this part of the analysis) 5 variables. Of these five variables four were measured at the baseline interview and one is measured in a follow up study. The variable of the follow-up study has 50% (14 cases) missing values as a result of cases that dropped out of the study. The variables at baseline have incidental missings (only 1 or 2 cases). All variables appear to be approximately normal distributed.
I created five imputed data files and noticed that in two of the imputed data sets, the imputed values for the variable of the follow-up study were severely out of the range of the observed cases (between 7 and 35) and in one imputed data set even way of the range of theoretically possible values (between 0 and 35). 
I would like to restrict the range of imputed values and tried the (predictive mean) matching option. This technique resulted into an imputed data set where the minimal observation was (value 7) was matched to nine of the fourteen missings. 
I would like to try a different solution. According to Allison (2001) “Some software can handle the restricted range problem in another way. If you specify a maximum or a minimum value for a particular variable, it will reject all random draws outside that range and simply take additional draws until it gets one within the specified range.” (page 39). Am I correct that this option is not (yet) implemented in the ice package or the mi impute option? 
The other solution Allison suggests is “to transform the variables” and “after the data have been imputed, the reverse transformation can be applied to bring the variable back to its original metric”. The transformation that seems appropriate here is a logit transformation after rescaling the original observed range of 7-35 or theoretical range of 0-35 to the 0-1 range. But now I am faced with another problem: observations at the minimum or maximum of the range are mapped to 0 or 1. By using the logit transformation I will lose these cases. A solution that comes to mind is not to map the original range to the closed interval [0,1] but to the open interval (0,1). Do you know of a “proper” way to achieve this? 
Thanks for any help,
Adriaan Hoogendoorn
GGZ inGeest, Amsterdam
Allison, Paul D. (2001) Missing Data. Sage University Papers Series on Quantitative Applications in the Social Sciences, 07-136, Thousand Oaks, CA: Sage.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index