Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Multiplying variables / Generating new variables after 'mi impute'

From   Alan Acock <>
Subject   Re: st: Multiplying variables / Generating new variables after 'mi impute'
Date   Wed, 28 Apr 2010 15:53:06 -0700

One problem with passive imputation of xy is that you will be adding information that is not consistent with observed data 
Alan Acock wrote:

>Moses Lee <> asks how to create a product of two imputed
>variables after -mi impute-:
>> I need to perform production function regression. This requires 2 stage.
>> 1) Impute missing variables
>> 2) Generate new variables - by multiplying two existing variables after
>> imputing missing variables.
>> I have tried running 'mi impute mvn' command. However, I realised the
>> imputed values do not replace the missing values in the original variables.
>> This becomes a problem if I need to estimate new variables which require a
>> product of the original variables.
>> An example: X and Y are variables that require imputing. After running the
>> mi impute command, I need to 'gen (newvar)=X*Y. However, if the missing
>> values in the existing X and Y are not replaced with the imputed values, I'm
>> unable to generate a new variable.
>> Can someone advise on how to replace missing values with imputed values?  It
>> seems impossible to generate new vars with the estimated imputed values.
>Moses's post raises two issues.  One is the mechanical issue of creating
>passive variables -- variables derived from the imputed variables.  Second is
>the statistical issue of how to handle passive variables during imputation.
>1.  Mechanical issues
>Moses mentioned the use of the -generate- command to create a product of two
>variables, and that did not work for him.
>To create passive variables based on the imputed variables, use the 
>-mi passive- command.  It will work.
>As Maarten Buis mentioned in 
>, -mi- provides
>lots of "styles" in which multiply imputed data might be stored, and I don't 
>know which Moses is using.  In some cases, Moses could use -generate-, 
>although he would need to follow that up with -mi register passive-.
>Regardless of all that, -mi passive- can be used with all styles, and 
>it always works the same way.
>Let me note that it is important to use -mi- specific commands in place of the
>standard Stata commands when there is an -mi- specific alternative.  The list
>of the -mi- specific commands can be found in -help mi-.  If there is no 
>-mi- specific version, before using the standard Stata construct, look 
>first at -mi xeq:-. 
>Anyway, there's no substitute for reading the manual.  One thing you will 
>learn is to always use -mi passive- when working with passive variables.
>2.  Statistical issues
>Steve Samuels replied that instead of creating a product variable after
>imputing the constituent variables, Moses should impute the product variable
>directly (  More
>generally, Moses does need to ensure that the imputation model used captures
>the structure of the analysis model of interest.  If an interaction between
>two variables is included in the analysis model, this interaction should also
>be present or accounted for in the imputation model.
>Two approaches for handling passive variables during imputation are considered
>in the literature.  I will refer to them as joint modeling (JM) and passive
>imputation (PI).  
>Per JM, a passive variable is treated simply as another imputation variable
>and standard imputation techniques are applied to it.  For example, if Y and X
>are being imputed using the multivariate normal model (MVN), then their
>product Y*X is simply included as another variable in the model specification.
>In Stata this would correspond to:
>      . gen yx = y*x
>      . mi set wide
>      . mi register imputed y x yx
>      . mi impute mvn y x yx ...
>One drawback of JM is that it does not take into account the functional
>relationship of yx with respect to other variables in the model.  Also, the
>assumption of joint normality in the presence of nonlinearities, such as the
>product, is suspect.  However, despite these drawbacks, this method is
>currently being used in practice.
>The PI method takes the functional relationship into account by including the
>product term yx in the model as a product of imputed y and imputed x.  The PI
>approach is available within the sequential imputation as implemented by the
>user-written command -ice-; type -findit ice- to locate the command (in Stata
>11, type -findit mi_ice- to locate the -mi--aware wrapper for -ice-).  The
>passive imputation would correspond to, I believe, the following syntaxes of
>-ice- and -mi ice-:
>      . ice y x yx ..., m(20) passive(yx:y*x) ...
>      . mi ice y x yx ..., add(20) passive(yx:y*x) ...
>Currently, there is no definite recommendation to which method should be used
>in practice, although, Patrick Royston and his colleagues have been
>investigating the performance of the two approaches and may have more insight
>regarding these issues.
>-- Yulia
>*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index