Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Alan Acock <acock@mac.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Multiplying variables / Generating new variables after 'mi impute' |

Date |
Wed, 28 Apr 2010 15:53:06 -0700 |

One problem with passive imputation of xy is that you will be adding information that is not consistent with observed data Alan Acock ymarchenko@stata.com wrote: >Moses Lee <Moses.Lee@anu.edu.au> asks how to create a product of two imputed >variables after -mi impute-: > >> I need to perform production function regression. This requires 2 stage. >> >> 1) Impute missing variables >> 2) Generate new variables - by multiplying two existing variables after >> imputing missing variables. >> >> I have tried running 'mi impute mvn' command. However, I realised the >> imputed values do not replace the missing values in the original variables. >> This becomes a problem if I need to estimate new variables which require a >> product of the original variables. >> >> An example: X and Y are variables that require imputing. After running the >> mi impute command, I need to 'gen (newvar)=X*Y. However, if the missing >> values in the existing X and Y are not replaced with the imputed values, I'm >> unable to generate a new variable. >> >> Can someone advise on how to replace missing values with imputed values? It >> seems impossible to generate new vars with the estimated imputed values. > >Moses's post raises two issues. One is the mechanical issue of creating >passive variables -- variables derived from the imputed variables. Second is >the statistical issue of how to handle passive variables during imputation. > > >1. Mechanical issues >--------------------- > >Moses mentioned the use of the -generate- command to create a product of two >variables, and that did not work for him. > >To create passive variables based on the imputed variables, use the >-mi passive- command. It will work. > >As Maarten Buis mentioned in >http://www.stata.com/statalist/archive/2010-04/msg01603.html, -mi- provides >lots of "styles" in which multiply imputed data might be stored, and I don't >know which Moses is using. In some cases, Moses could use -generate-, >although he would need to follow that up with -mi register passive-. >Regardless of all that, -mi passive- can be used with all styles, and >it always works the same way. > >Let me note that it is important to use -mi- specific commands in place of the >standard Stata commands when there is an -mi- specific alternative. The list >of the -mi- specific commands can be found in -help mi-. If there is no >-mi- specific version, before using the standard Stata construct, look >first at -mi xeq:-. > >Anyway, there's no substitute for reading the manual. One thing you will >learn is to always use -mi passive- when working with passive variables. > > >2. Statistical issues >---------------------- > >Steve Samuels replied that instead of creating a product variable after >imputing the constituent variables, Moses should impute the product variable >directly (http://www.stata.com/statalist/archive/2010-04/msg01604.html). More >generally, Moses does need to ensure that the imputation model used captures >the structure of the analysis model of interest. If an interaction between >two variables is included in the analysis model, this interaction should also >be present or accounted for in the imputation model. > >Two approaches for handling passive variables during imputation are considered >in the literature. I will refer to them as joint modeling (JM) and passive >imputation (PI). > >Per JM, a passive variable is treated simply as another imputation variable >and standard imputation techniques are applied to it. For example, if Y and X >are being imputed using the multivariate normal model (MVN), then their >product Y*X is simply included as another variable in the model specification. >In Stata this would correspond to: > > . gen yx = y*x > . mi set wide > . mi register imputed y x yx > . mi impute mvn y x yx ... > >One drawback of JM is that it does not take into account the functional >relationship of yx with respect to other variables in the model. Also, the >assumption of joint normality in the presence of nonlinearities, such as the >product, is suspect. However, despite these drawbacks, this method is >currently being used in practice. > >The PI method takes the functional relationship into account by including the >product term yx in the model as a product of imputed y and imputed x. The PI >approach is available within the sequential imputation as implemented by the >user-written command -ice-; type -findit ice- to locate the command (in Stata >11, type -findit mi_ice- to locate the -mi--aware wrapper for -ice-). The >passive imputation would correspond to, I believe, the following syntaxes of >-ice- and -mi ice-: > > . ice y x yx ..., m(20) passive(yx:y*x) ... > . mi ice y x yx ..., add(20) passive(yx:y*x) ... > >Currently, there is no definite recommendation to which method should be used >in practice, although, Patrick Royston and his colleagues have been >investigating the performance of the two approaches and may have more insight >regarding these issues. > > >-- Yulia >ymarchenko@stata.com >* >* For searches and help try: >* http://www.stata.com/help.cgi?search >* http://www.stata.com/support/statalist/faq >* http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: "doing anything the quickest way does no harm"** - Next by Date:
**Re: st: RE: dprobit and est2tex** - Previous by thread:
**Re: st: Multiplying variables / Generating new variables after 'mi impute'** - Index(es):