Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Missing Observations. Do I need multiple Imputations?

From	A Loumiotis <[email protected]>
To	[email protected]
Subject	Re: st: Missing Observations. Do I need multiple Imputations?
Date	Wed, 22 Aug 2012 12:08:03 +0300

I agree with you and I think that's what I also said.  Your composite
variable is missing if at least one of it component variables (B C D E
F) is missing.  When none of the component variables are missing then
your composite variable is not missing.

On Wed, Aug 22, 2012 at 10:32 AM, Abekah Nkrumah <[email protected]> wrote:
> Dear Antonis,
>
> Thank you very much for your reply. I want to understand your first
> line were you saying my aggregate variable is missing entirely? In my
> statement I said the composite index (A) which you refereed to as
> aggregate variable is there but drops substantial amount of
> observations. So it is not entirely missing
>
> Thanks very much
>
> Regards
>
> On Wed, Aug 22, 2012 at 7:44 AM, A Loumiotis
> <[email protected]> wrote:
>> Hi Gordon,
>>
>> Since your aggregate variable is missing when at least one component
>> is missing I believe you would first need to multiple impute the
>> missing observations of your dataset and then compute your aggregate
>> variable.  I don't see a problem with multiple imputing variables such
>> as age or number of wifes.  In addition, your results might change if
>> your data are missing (conditionally) at random even if your non
>> missing sample is large.
>>
>> Best,
>> Antonis
>>
>>
>>
>> On Tue, Aug 21, 2012 at 7:18 PM, Abekah Nkrumah <[email protected]> wrote:
>>> Dear Statalist,
>>>
>>>
>>> I will want some advice on this rather long question. Variable A in
>>> the table below is a composite index derived from the aggregation
>>> variables B, C, D, E and F which are also sub-indices. A geometric
>>> aggregation method was used. From the table I realise that the
>>> observations on the composite index (A) drops significantly
>>>
>>>
>>>  Variable |       Obs        Mean        Std. Dev.       Min        Max
>>> -------------+--------------------------------------------------------
>>> A                   69623    .4898275    .1575975   .0498657   .8980919
>>> B                  187524     .524507    .2669241   1.80e-08          1
>>> C                  221089    .6625131    .3732415   2.18e-08          1
>>> D                 234680    .7486263    .3494941  -1.29e-08          1
>>> E                 108437    .5253285    .0648927  -2.61e-08          1
>>> -------------+--------------------------------------------------------
>>> F                 119261    .6829314    .2270192  -1.62e-08          1
>>>
>>>
>>> I then decided to do a missing data check for all the indices and the
>>> results is below
>>>
>>>  Variable    |       Missing          Total     Percent Missing
>>> ----------------+-----------------------------------------------
>>> A                        166,075        235,698          70.46
>>> B                        48,174        235,698          20.44
>>> C                       14,609        235,698           6.20
>>> D                       1,018           235,698           0.43
>>> E                       127,261        235,698          53.99
>>> F                        116,437        235,698          49.40
>>> ----------------+-----------------------------------------------
>>>
>>>
>>> I then checked the percentage missing for all the individual variables
>>> used in computing the  the sub-indices especially B, C, E and F. The
>>> results is as below
>>>
>>>
>>>            Variable    |     Missing          Total     Percent Missing
>>> ----------------+-----------------------------------------------
>>>   B1 |      46,317        235,698          19.65
>>>   B2 |      46,967        235,698          19.93
>>>   B3 |      46,815        235,698          19.86
>>>   B4 |        47,005        235,698          19.94
>>>   C1 |       5,128        235,698           2.18
>>>   C2 |        5,164        235,698           2.19
>>>   C3 |       6,180        235,698           2.62
>>>   C4 |       9,730        235,698           4.13
>>>   C5 |       5,608        235,698           2.38
>>>   D1 |         444        235,698           0.19
>>>   D2 |         483        235,698           0.20
>>>   D3 |         657        235,698           0.28
>>>   E1 |      82,112        235,698          34.84
>>>   E2 |      58,504        235,698          24.82
>>>   E3 |      65,469        235,698          27.78
>>>   E4|          81,349        235,698        34.51
>>>   F1 |         214          235,698           0.09
>>>   F2 |      63,503        235,698          26.94
>>>   F3 |        86,512        235,698          36.70
>>>   F4 |         674        235,698           0.29
>>> ----------------+-----------------------------------------------
>>>
>>> The results above suggest that the drop in the number of observations
>>> for the composite empowerment variable is due to the high level of
>>> missing values in the four sub-indices (B, C, E and F) as also
>>> supported by the high level of missing values in the variables used in
>>> computing those indices.
>>>
>>> I was therefore wondering whether an explanation like this in the
>>> appendix of my work will be fine or I will need to do multiple
>>> imputing to replace the missing data.
>>>
>>> I have thought through this and the question am asking myself is that
>>> if have to do multiple imputation, the variables to for the imputation
>>> exercise will be the B variables (these are decision-making
>>> variables), then the E variables (these are number of wives, age at
>>> first marriage, women's age, partners age) and then F3 and F4 (which
>>> are partner's education and whether a woman earns cash).
>>>
>>> My worry is whether it will be sensible to impute variables such as
>>> age and number of wives? Secondly considering that I still have a
>>> large sample size to work with, y guess is that the results from the
>>> remaining sample will not change that much. Thus am wandering whether
>>> it will still be  necessary to impute the missing data
>>>
>>> I will appreciate to hear from you on this so Will know which way to
>>> go. Thank you very much.
>>>
>>> Regards
>>>
>>> Gordon
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
>
> --
> **********************************************
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Missing Observations. Do I need multiple Imputations?
  - From: Abekah Nkrumah <[email protected]>
- Re: st: Missing Observations. Do I need multiple Imputations?
  - From: A Loumiotis <[email protected]>
- Re: st: Missing Observations. Do I need multiple Imputations?
  - From: Abekah Nkrumah <[email protected]>

Prev by Date: Re: Re: st: Out-of-sample forecasting using OLS regression
Next by Date: st: From: Jamie Madden <[email protected]>
Previous by thread: Re: st: Missing Observations. Do I need multiple Imputations?
Next by thread: Re: st: Missing Observations. Do I need multiple Imputations?
Index(es):
- Date
- Thread