Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Making sense of -mi passive-


From   Henrik Stovring <[email protected]>
To   statalist <[email protected]>
Subject   st: Making sense of -mi passive-
Date   Fri, 26 Nov 2010 14:25:21 +0100

Dear all,

In the manual on missing data in Stata 11, page 193, an example is given
on how one can use -egen- with the -mean()- function. However, I find it
very difficult to understand what Stata actually computes in this
situation, and in any case it is not what I would have expected. This
may be a feature, but perhaps others would be equally surprised, and
perhaps this would warrant at least a fair warning in the manual on what
is actually the result of running this command?

In short the problem is the following: Imagine the following dataset,
where IQ has been imputed in two datasets, for the three subjects where
it was missing:

. list id iq _mi*

     +-------------------------------------------+
     | id         iq   _mi_m   _mi_id   _mi_miss |
     |-------------------------------------------|
  1. |  1          .       0        8          1 |
  2. |  2          .       0        9          1 |
  3. |  3          .       0       10          1 |
  4. |  4        108       0        1          0 |
  5. |  5        117       0        2          0 |
     |-------------------------------------------|
  6. |  6         87       0        3          0 |
  7. |  7         88       0        4          0 |
  8. |  8         78       0        5          0 |
  9. |  9         96       0        6          0 |
 10. | 10        128       0        7          0 |
     |-------------------------------------------|
 11. |  1   123.1627       1        8          . |
 12. |  2   100.7916       1        9          . |
 13. |  3   103.7483       1       10          . |
 14. |  4        108       1        1          . |
 15. |  5        117       1        2          . |
     |-------------------------------------------|
 16. |  6         87       1        3          . |
 17. |  7         88       1        4          . |
 18. |  8         78       1        5          . |
 19. |  9         96       1        6          . |
 20. | 10        128       1        7          . |
     |-------------------------------------------|
 21. |  1   106.0351       2        8          . |
 22. |  2   91.59879       2        9          . |
 23. |  3   88.79485       2       10          . |
 24. |  4        108       2        1          . |
 25. |  5        117       2        2          . |
     |-------------------------------------------|
 26. |  6         87       2        3          . |
 27. |  7         88       2        4          . |
 28. |  8         78       2        5          . |
 29. |  9         96       2        6          . |
 30. | 10        128       2        7          . |
     +-------------------------------------------+

Imagine that we now want the mean IQ in each imputed dataset (not
exactly relevant here, but we may for example want to restandardize IQ
to have a specific mean, while taking into account the imputed values -
in short we do to each imputed dataset what we would have done, had the
dataset been complete), and so (following the manual) we run:

. mi passive: egen meanlongpas=mean(iq)
(passive variable meanlongpas unregistered because not in m=0)
m=0:
m=1:
m=2:
(14 values of passive variable meanlongpas in m>0 updated to match
values in m=0)

.
. list id iq meanlongpas _mi*

     +------------------------------------------------------+
     | id         iq   meanlo~s   _mi_m   _mi_id   _mi_miss |
     |------------------------------------------------------|
  1. |  4        108   100.2857       0        1          0 |
  2. |  5        117   100.2857       0        2          0 |
  3. |  6         87   100.2857       0        3          0 |
  4. |  7         88   100.2857       0        4          0 |
  5. |  8         78   100.2857       0        5          0 |
     |------------------------------------------------------|
  6. |  9         96   100.2857       0        6          0 |
  7. | 10        128   100.2857       0        7          0 |
  8. |  1          .   100.2857       0        8          1 |
  9. |  2          .   100.2857       0        9          1 |
 10. |  3          .   100.2857       0       10          1 |
     |------------------------------------------------------|
 11. |  4        108   100.2857       1        1          . |
 12. |  5        117   100.2857       1        2          . |
 13. |  6         87   100.2857       1        3          . |
 14. |  7         88   100.2857       1        4          . |
 15. |  8         78   100.2857       1        5          . |
     |------------------------------------------------------|
 16. |  9         96   100.2857       1        6          . |
 17. | 10        128   100.2857       1        7          . |
 18. |  1   123.1627   102.9703       1        8          . |
 19. |  2   100.7916   102.9703       1        9          . |
 20. |  3   103.7483   102.9703       1       10          . |
     |------------------------------------------------------|
 21. |  4        108   100.2857       2        1          . |
 22. |  5        117   100.2857       2        2          . |
 23. |  6         87   100.2857       2        3          . |
 24. |  7         88   100.2857       2        4          . |
 25. |  8         78   100.2857       2        5          . |
     |------------------------------------------------------|
 26. |  9         96   100.2857       2        6          . |
 27. | 10        128   100.2857       2        7          . |
 28. |  1   106.0351   98.84287       2        8          . |
 29. |  2   91.59879   98.84287       2        9          . |
 30. |  3   88.79485   98.84287       2       10          . |
     +------------------------------------------------------+

What is peculiar is that the mean computed by -egen- is not constant
within the imputed datasets. It seems that for the records where IQ was
actually observed, -egen- returns the mean computed in original dataset,
whereas it returns the mean computed on all IQ values (observed AND
imputed values) for those records where IQ was missing before
imputation. Is this really meaningful? If so, I think the manual should
not use this as an introductory example without any warning.

What do you think? What am I missing here :-)?

Best,

Henrik


-- 
Henrik Støvring			Department of Biostatistics
Associate professor            	University of Aarhus
[email protected]     	Bartholins Allé 2, Bldg 1261, 217
Phone +45 8942 6131            	8000 Aarhus
Fax +45 8942 6140              	Denmark
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index