Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Making sense of -mi passive-

From	[email protected] (Yulia Marchenko, StataCorp LP)
To	statalist <[email protected]>
Subject	Re: st: Making sense of -mi passive-
Date	Mon, 29 Nov 2010 12:22:47 -0600

Henrik Stovring <[email protected]> would like to obtain
imputation-specific averages which are constant within imputations and,
following [MI] documentation, uses the -mean()- function of -mi passive:
egen-.  Henrik observed that to his surprise the obtained averages vary within
imputations:

> In the manual on missing data in Stata 11, page 193, an example is given on
> how one can use -egen- with the -mean()- function. However, I find it very
> difficult to understand what Stata actually computes in this situation, and
> in any case it is not what I would have expected. This may be a feature, but
> perhaps others would be equally surprised, and perhaps this would warrant at
> least a fair warning in the manual on what is actually the result of running
> this command?
>
> ...
>
> . mi passive: egen meanlongpas=mean(iq)
>
> ...
>
> What is peculiar is that the mean computed by -egen- is not constant within
> the imputed datasets. It seems that for the records where IQ was actually
> observed, -egen- returns the mean computed in original dataset, whereas it
> returns the mean computed on all IQ values (observed AND imputed values) for
> those records where IQ was missing before imputation. Is this really
> meaningful? If so, I think the manual should not use this as an introductory
> example without any warning.


There's good reason for Henrik's confusion.  Our use of the -egen ...
= mean()- in the cited example is incorrect and will be changed in the
future.


1. Solution to Henrik's problem
-------------------------------

Before I explain the behavior of -mi passive-, let me provide the
solution that the manual should have presented.

The solution for Henrik's problem is to use the -mi xeq: egen- in
place of -mi passive: egen-.  -mi xeq: egen- will produce the desired
imputation-specific averages constant within imputations:

       . mi xeq: egen meanlongpas=mean(iq)

Henrik does not need to -mi register passive- the new meanlongpas 
variable and, in fact, must not register it.



2. Explanation of the behavior of -mi passive-
----------------------------------------------

Henrik's initial thought of using -mi passive: egen- to create the
averages makes sense because the meanlongpas is a function of the
imputed variable iq and thus, by definition, is a passive variable.

To understand why this did not result in the desired averages constant
within imputations, we will need to distinguish between the intuitive
definition of a passive variable and -mi-'s definition of -passive-.

To do that, we need two other definitions that we use in the manual, 
the definitions for varying and super varying.

    Varying: a variable is said to be varying if its values in the 
             _incomplete_ observations differ across imputations.

    Super varying: a variable is said to be super varying if its
             values in the _complete observations_ differ across 
             imputations.  

See "Super-varying variables" in [MI] glossary, which can also be
accessed from the documentation of -mi register- in '[MI] mi set', for
more detail.

Anyway, within -mi-, imputed and passive variables are expected to be 
varying, not super varying.  That is, the values are not allowed to
vary across imputations in the complete observations.  Rather, the
complete observations are expected to have the same values as in the
original data across imputations.

This distinction between passive (varying) variables and super-varying 
variables allows -mi- to detect inconsistencies (i.e., mistakes) among
complete observations across imputations and fix such inconsistencies.

Henrik's meanlongpas variable, however, is a super-varying variable;
it has different values in both complete and incomplete observations
across imputations.  Because Henrik used -mi passive: egen ...-, -mi
passive- "fixed" the problem and replaced the values of complete
observations of meanlongpas with the corresponding values in the 
original data (m=0).

That's not what Henrik wanted.

To create super-varying variables, use -mi xeq: generate ...- or 
-mi xeq: egen ...-.  Then do *NOT* register newly created variables, 
and most especially, do not register them as -passive- or -mi- will 
"fix" the problem just as it did when Henrik used -mi passive-.

In the manual, you will also read that super-varying variables can
exist only in the -flong- or -flongsep- styles.  The manual should
have said, "in general, only in the -flong- or -flongsep- styles".
Because meanlongpas is actually constant within each imputation, this
super-varying variable can exist in any style.


3. Concerning -egen- and passive variables in general 
-----------------------------------------------------

Henrik used -mi passive: egen ...- and that lead to a problem.  You might 
thus conclude that you should never use -mi passive- with -egen-.  That is 
not true, but it is nearly true.  You can use -mi passive- with -egen-'s 
-rowmean()- function, for instance.

You can use -mi passive- with any function that produces values 
that solely depend on values within the observation.  In general, 
you cannot use -mi passive- with functions that produce values 
that depend on groups of observations.


-- Yulia
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: RE: RE: RE: RE: summarize conditions within subjects in panel data
Next by Date: Re: st: Extract a letter between numbers
Previous by thread: st: Making sense of -mi passive-
Next by thread: st: xtlogit and clustering
Index(es):
- Date
- Thread