Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
ymarchenko@stata.com (Yulia Marchenko, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Multiplying variables / Generating new variables after 'mi impute' |

Date |
Wed, 28 Apr 2010 15:09:25 -0500 |

Moses Lee <Moses.Lee@anu.edu.au> asks how to create a product of two imputed variables after -mi impute-: > I need to perform production function regression. This requires 2 stage. > > 1) Impute missing variables > 2) Generate new variables - by multiplying two existing variables after > imputing missing variables. > > I have tried running 'mi impute mvn' command. However, I realised the > imputed values do not replace the missing values in the original variables. > This becomes a problem if I need to estimate new variables which require a > product of the original variables. > > An example: X and Y are variables that require imputing. After running the > mi impute command, I need to 'gen (newvar)=X*Y. However, if the missing > values in the existing X and Y are not replaced with the imputed values, I'm > unable to generate a new variable. > > Can someone advise on how to replace missing values with imputed values? It > seems impossible to generate new vars with the estimated imputed values. Moses's post raises two issues. One is the mechanical issue of creating passive variables -- variables derived from the imputed variables. Second is the statistical issue of how to handle passive variables during imputation. 1. Mechanical issues --------------------- Moses mentioned the use of the -generate- command to create a product of two variables, and that did not work for him. To create passive variables based on the imputed variables, use the -mi passive- command. It will work. As Maarten Buis mentioned in http://www.stata.com/statalist/archive/2010-04/msg01603.html, -mi- provides lots of "styles" in which multiply imputed data might be stored, and I don't know which Moses is using. In some cases, Moses could use -generate-, although he would need to follow that up with -mi register passive-. Regardless of all that, -mi passive- can be used with all styles, and it always works the same way. Let me note that it is important to use -mi- specific commands in place of the standard Stata commands when there is an -mi- specific alternative. The list of the -mi- specific commands can be found in -help mi-. If there is no -mi- specific version, before using the standard Stata construct, look first at -mi xeq:-. Anyway, there's no substitute for reading the manual. One thing you will learn is to always use -mi passive- when working with passive variables. 2. Statistical issues ---------------------- Steve Samuels replied that instead of creating a product variable after imputing the constituent variables, Moses should impute the product variable directly (http://www.stata.com/statalist/archive/2010-04/msg01604.html). More generally, Moses does need to ensure that the imputation model used captures the structure of the analysis model of interest. If an interaction between two variables is included in the analysis model, this interaction should also be present or accounted for in the imputation model. Two approaches for handling passive variables during imputation are considered in the literature. I will refer to them as joint modeling (JM) and passive imputation (PI). Per JM, a passive variable is treated simply as another imputation variable and standard imputation techniques are applied to it. For example, if Y and X are being imputed using the multivariate normal model (MVN), then their product Y*X is simply included as another variable in the model specification. In Stata this would correspond to: . gen yx = y*x . mi set wide . mi register imputed y x yx . mi impute mvn y x yx ... One drawback of JM is that it does not take into account the functional relationship of yx with respect to other variables in the model. Also, the assumption of joint normality in the presence of nonlinearities, such as the product, is suspect. However, despite these drawbacks, this method is currently being used in practice. The PI method takes the functional relationship into account by including the product term yx in the model as a product of imputed y and imputed x. The PI approach is available within the sequential imputation as implemented by the user-written command -ice-; type -findit ice- to locate the command (in Stata 11, type -findit mi_ice- to locate the -mi--aware wrapper for -ice-). The passive imputation would correspond to, I believe, the following syntaxes of -ice- and -mi ice-: . ice y x yx ..., m(20) passive(yx:y*x) ... . mi ice y x yx ..., add(20) passive(yx:y*x) ... Currently, there is no definite recommendation to which method should be used in practice, although, Patrick Royston and his colleagues have been investigating the performance of the two approaches and may have more insight regarding these issues. -- Yulia ymarchenko@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: RE: Graphing the median of a list of variables** - Next by Date:
**Re: st: Overriding a loop if 0 observations using tabstat** - Previous by thread:
**st: "doing anything the quickest way does no harm"** - Next by thread:
**Re: st: Multiplying variables / Generating new variables after 'mi impute'** - Index(es):