Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: Missing Observations. Do I need multiple Imputations?

 From Abekah Nkrumah To statalist@hsphsun2.harvard.edu Subject st: Missing Observations. Do I need multiple Imputations? Date Tue, 21 Aug 2012 17:18:16 +0100

```Dear Statalist,

I will want some advice on this rather long question. Variable A in
the table below is a composite index derived from the aggregation
variables B, C, D, E and F which are also sub-indices. A geometric
aggregation method was used. From the table I realise that the
observations on the composite index (A) drops significantly

Variable |       Obs        Mean        Std. Dev.       Min        Max
-------------+--------------------------------------------------------
A                   69623    .4898275    .1575975   .0498657   .8980919
B                  187524     .524507    .2669241   1.80e-08          1
C                  221089    .6625131    .3732415   2.18e-08          1
D                 234680    .7486263    .3494941  -1.29e-08          1
E                 108437    .5253285    .0648927  -2.61e-08          1
-------------+--------------------------------------------------------
F                 119261    .6829314    .2270192  -1.62e-08          1

I then decided to do a missing data check for all the indices and the
results is below

Variable    |       Missing          Total     Percent Missing
----------------+-----------------------------------------------
A                        166,075        235,698          70.46
B                        48,174        235,698          20.44
C                       14,609        235,698           6.20
D                       1,018           235,698           0.43
E                       127,261        235,698          53.99
F                        116,437        235,698          49.40
----------------+-----------------------------------------------

I then checked the percentage missing for all the individual variables
used in computing the  the sub-indices especially B, C, E and F. The
results is as below

Variable    |     Missing          Total     Percent Missing
----------------+-----------------------------------------------
B1 |      46,317        235,698          19.65
B2 |      46,967        235,698          19.93
B3 |      46,815        235,698          19.86
B4 |        47,005        235,698          19.94
C1 |       5,128        235,698           2.18
C2 |        5,164        235,698           2.19
C3 |       6,180        235,698           2.62
C4 |       9,730        235,698           4.13
C5 |       5,608        235,698           2.38
D1 |         444        235,698           0.19
D2 |         483        235,698           0.20
D3 |         657        235,698           0.28
E1 |      82,112        235,698          34.84
E2 |      58,504        235,698          24.82
E3 |      65,469        235,698          27.78
E4|          81,349        235,698        34.51
F1 |         214          235,698           0.09
F2 |      63,503        235,698          26.94
F3 |        86,512        235,698          36.70
F4 |         674        235,698           0.29
----------------+-----------------------------------------------

The results above suggest that the drop in the number of observations
for the composite empowerment variable is due to the high level of
missing values in the four sub-indices (B, C, E and F) as also
supported by the high level of missing values in the variables used in
computing those indices.

I was therefore wondering whether an explanation like this in the
appendix of my work will be fine or I will need to do multiple
imputing to replace the missing data.

I have thought through this and the question am asking myself is that
if have to do multiple imputation, the variables to for the imputation
exercise will be the B variables (these are decision-making
variables), then the E variables (these are number of wives, age at
first marriage, women's age, partners age) and then F3 and F4 (which
are partner's education and whether a woman earns cash).

My worry is whether it will be sensible to impute variables such as
age and number of wives? Secondly considering that I still have a
large sample size to work with, y guess is that the results from the
remaining sample will not change that much. Thus am wandering whether
it will still be  necessary to impute the missing data

I will appreciate to hear from you on this so Will know which way to
go. Thank you very much.

Regards

Gordon
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```