Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Abekah Nkrumah <ankrumah@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: Missing Observations. Do I need multiple Imputations? |
Date | Tue, 21 Aug 2012 17:18:16 +0100 |
Dear Statalist, I will want some advice on this rather long question. Variable A in the table below is a composite index derived from the aggregation variables B, C, D, E and F which are also sub-indices. A geometric aggregation method was used. From the table I realise that the observations on the composite index (A) drops significantly Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- A 69623 .4898275 .1575975 .0498657 .8980919 B 187524 .524507 .2669241 1.80e-08 1 C 221089 .6625131 .3732415 2.18e-08 1 D 234680 .7486263 .3494941 -1.29e-08 1 E 108437 .5253285 .0648927 -2.61e-08 1 -------------+-------------------------------------------------------- F 119261 .6829314 .2270192 -1.62e-08 1 I then decided to do a missing data check for all the indices and the results is below Variable | Missing Total Percent Missing ----------------+----------------------------------------------- A 166,075 235,698 70.46 B 48,174 235,698 20.44 C 14,609 235,698 6.20 D 1,018 235,698 0.43 E 127,261 235,698 53.99 F 116,437 235,698 49.40 ----------------+----------------------------------------------- I then checked the percentage missing for all the individual variables used in computing the the sub-indices especially B, C, E and F. The results is as below Variable | Missing Total Percent Missing ----------------+----------------------------------------------- B1 | 46,317 235,698 19.65 B2 | 46,967 235,698 19.93 B3 | 46,815 235,698 19.86 B4 | 47,005 235,698 19.94 C1 | 5,128 235,698 2.18 C2 | 5,164 235,698 2.19 C3 | 6,180 235,698 2.62 C4 | 9,730 235,698 4.13 C5 | 5,608 235,698 2.38 D1 | 444 235,698 0.19 D2 | 483 235,698 0.20 D3 | 657 235,698 0.28 E1 | 82,112 235,698 34.84 E2 | 58,504 235,698 24.82 E3 | 65,469 235,698 27.78 E4| 81,349 235,698 34.51 F1 | 214 235,698 0.09 F2 | 63,503 235,698 26.94 F3 | 86,512 235,698 36.70 F4 | 674 235,698 0.29 ----------------+----------------------------------------------- The results above suggest that the drop in the number of observations for the composite empowerment variable is due to the high level of missing values in the four sub-indices (B, C, E and F) as also supported by the high level of missing values in the variables used in computing those indices. I was therefore wondering whether an explanation like this in the appendix of my work will be fine or I will need to do multiple imputing to replace the missing data. I have thought through this and the question am asking myself is that if have to do multiple imputation, the variables to for the imputation exercise will be the B variables (these are decision-making variables), then the E variables (these are number of wives, age at first marriage, women's age, partners age) and then F3 and F4 (which are partner's education and whether a woman earns cash). My worry is whether it will be sensible to impute variables such as age and number of wives? Secondly considering that I still have a large sample size to work with, y guess is that the results from the remaining sample will not change that much. Thus am wandering whether it will still be necessary to impute the missing data I will appreciate to hear from you on this so Will know which way to go. Thank you very much. Regards Gordon * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/