Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Treatment for Missing Values - What Options ?


From   Chao Yawo <Yawo1964@yahoo.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Treatment for Missing Values - What Options ?
Date   Tue, 14 Jul 2009 08:41:27 -0400

Thanks, both Svend and Rich,

I read the Demographic and Health survey website further, and the
"missing values" are truly missing - due to interviewer errors. So,
instead of imputing them, I decided on a different strategy.
Conceptually, what is important for HIV risk is not necessarily the
issue of unprotected sex, but unprotected sex with partners other than
one's primary partner.  So, I created a different variable that
measures the extent to which people engage in unprotected sex (ie
without a condom) with a non-spousal or non-cohabiting partners (1),
or otherwise (0), and got the following distribution;


RiskySex
-----------------------------------------------------------
              |      Freq.    Percent      Valid       Cum.
--------------+--------------------------------------------
Valid   0     |       9377      87.59      87.59      87.59
        1     |       1329      12.41      12.41     100.00
        Total |      10706     100.00     100.00
-----------------------------------------------------------

Though the proportion at risk is small - 12% (and probably
understimated due to perceived social aversion to such self-reports),
it avoids the problem with missing values.

Cheers, Cy
------------------------------------------------------------------


On Tue, Jul 14, 2009 at 7:51 AM, Richard
Goldstein<richgold@ix.netcom.com> wrote:
> It is not clear what Svend thinks is going on here, but for anyone thinking
> of using this strategy, I recommend reading Jones, MP (1996), "Indicator and
> Stratificatio Methods for missing explanatory variables in multiple linear
> regression," _Journal of the American Statistical Association_, 91: 222-230
>
> Rich
>
> Svend Juul wrote:
>>
>> Cy wrote:
>>  In a previous post, I indicated there was a drastic reduction in my
>> sub-population size. I traced the problem to a variable with a lot of
>> missing cases.
>>  As you can see from the table below, this variable elicits whether the
>> respondent engaged in unprotected sexual intercourse. About a third of
>> the cases (33.78%) are missing.
>>  V761 -- Last intercourse used condom
>> -----------------------------------------------------------
>>               |      Freq.    Percent      Valid       Cum.
>> ---------------+--------------------------------------------
>> Valid   0 No   |       6012      56.16      84.81      84.81
>>        1 Yes  |       1075      10.04      15.16      99.97
>>        9      |          2       0.02       0.03     100.00
>>        Total  |       7089      66.22     100.00
>> Missing .      |       3617      33.78
>> Total          |      10706     100.00
>> -----------------------------------------------------------
>>  Since the dependent variable in my deals with HIV risk, I need to
>> include sexual risk variables such as the V761 in the model.  How do I
>> deal with this missing data problem, so that it does not affect my
>> sample size. Would an imputation work?
>>  ==========================================================
>>  In this case, I would avoid imputation and instead generate two dummy
>> variables:
>>   V761_0 = 1 if no condom use, otherwise 0
>>   V761_miss = 1 if missing or 9, otherwise 0
>>      . generate V761_0 = V761==0
>>    . generate V761_miss = V761>1
>>    . groups V761* , missing
>>      +--------------------------------------------+
>>      | V761   V761_0   V761_m~s   Freq.   Percent |
>>      |--------------------------------------------|
>>      |    0        1          0    6012     56.16 |
>>      |    1        0          0    1075     10.04 |
>>      |    9        0          1       2      0.02 |
>>      |    .        0          1    3617     33.78 |
>>      +--------------------------------------------+
>>  -groups- is an unofficial command (ssc install groups).
>>  Both variables should be included in your regression. You will still
>> have a problem interpreting what missing means, but that problem
>> can not be solved by imputation.
>>  Hope this helps
>> Svend
>> ________________________________________________________  Svend Juul
>> Institut for Folkesundhed, Afdeling for Epidemiologi
>> (School of Public Health, Department of Epidemiology)
>> Bartholins Allé 2
>> DK-8000 Aarhus C,  Denmark Phone:   +45 8693 7796 Mobile:  +45 2634 7796
>> E-mail:  sj@soci.au.dk
>> _________________________________________________________
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index