Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Missing Observations. Do I need multiple Imputations?


From   A Loumiotis <[email protected]>
To   [email protected]
Subject   Re: st: Missing Observations. Do I need multiple Imputations?
Date   Wed, 22 Aug 2012 09:44:18 +0300

Hi Gordon,

Since your aggregate variable is missing when at least one component
is missing I believe you would first need to multiple impute the
missing observations of your dataset and then compute your aggregate
variable.  I don't see a problem with multiple imputing variables such
as age or number of wifes.  In addition, your results might change if
your data are missing (conditionally) at random even if your non
missing sample is large.

Best,
Antonis



On Tue, Aug 21, 2012 at 7:18 PM, Abekah Nkrumah <[email protected]> wrote:
> Dear Statalist,
>
>
> I will want some advice on this rather long question. Variable A in
> the table below is a composite index derived from the aggregation
> variables B, C, D, E and F which are also sub-indices. A geometric
> aggregation method was used. From the table I realise that the
> observations on the composite index (A) drops significantly
>
>
>  Variable |       Obs        Mean        Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
> A                   69623    .4898275    .1575975   .0498657   .8980919
> B                  187524     .524507    .2669241   1.80e-08          1
> C                  221089    .6625131    .3732415   2.18e-08          1
> D                 234680    .7486263    .3494941  -1.29e-08          1
> E                 108437    .5253285    .0648927  -2.61e-08          1
> -------------+--------------------------------------------------------
> F                 119261    .6829314    .2270192  -1.62e-08          1
>
>
> I then decided to do a missing data check for all the indices and the
> results is below
>
>  Variable    |       Missing          Total     Percent Missing
> ----------------+-----------------------------------------------
> A                        166,075        235,698          70.46
> B                        48,174        235,698          20.44
> C                       14,609        235,698           6.20
> D                       1,018           235,698           0.43
> E                       127,261        235,698          53.99
> F                        116,437        235,698          49.40
> ----------------+-----------------------------------------------
>
>
> I then checked the percentage missing for all the individual variables
> used in computing the  the sub-indices especially B, C, E and F. The
> results is as below
>
>
>            Variable    |     Missing          Total     Percent Missing
> ----------------+-----------------------------------------------
>   B1 |      46,317        235,698          19.65
>   B2 |      46,967        235,698          19.93
>   B3 |      46,815        235,698          19.86
>   B4 |        47,005        235,698          19.94
>   C1 |       5,128        235,698           2.18
>   C2 |        5,164        235,698           2.19
>   C3 |       6,180        235,698           2.62
>   C4 |       9,730        235,698           4.13
>   C5 |       5,608        235,698           2.38
>   D1 |         444        235,698           0.19
>   D2 |         483        235,698           0.20
>   D3 |         657        235,698           0.28
>   E1 |      82,112        235,698          34.84
>   E2 |      58,504        235,698          24.82
>   E3 |      65,469        235,698          27.78
>   E4|          81,349        235,698        34.51
>   F1 |         214          235,698           0.09
>   F2 |      63,503        235,698          26.94
>   F3 |        86,512        235,698          36.70
>   F4 |         674        235,698           0.29
> ----------------+-----------------------------------------------
>
> The results above suggest that the drop in the number of observations
> for the composite empowerment variable is due to the high level of
> missing values in the four sub-indices (B, C, E and F) as also
> supported by the high level of missing values in the variables used in
> computing those indices.
>
> I was therefore wondering whether an explanation like this in the
> appendix of my work will be fine or I will need to do multiple
> imputing to replace the missing data.
>
> I have thought through this and the question am asking myself is that
> if have to do multiple imputation, the variables to for the imputation
> exercise will be the B variables (these are decision-making
> variables), then the E variables (these are number of wives, age at
> first marriage, women's age, partners age) and then F3 and F4 (which
> are partner's education and whether a woman earns cash).
>
> My worry is whether it will be sensible to impute variables such as
> age and number of wives? Secondly considering that I still have a
> large sample size to work with, y guess is that the results from the
> remaining sample will not change that much. Thus am wandering whether
> it will still be  necessary to impute the missing data
>
> I will appreciate to hear from you on this so Will know which way to
> go. Thank you very much.
>
> Regards
>
> Gordon
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index