[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Treatment for Missing Values - What Options ?

From	Richard Goldstein <[email protected]>
To	[email protected]
Subject	Re: st: Treatment for Missing Values - What Options ?
Date	Tue, 14 Jul 2009 07:51:51 -0400

It is not clear what Svend thinks is going on here, but for anyonethinking of using this strategy, I recommend reading Jones, MP (1996),"Indicator and Stratificatio Methods for missing explanatory variablesin multiple linear regression," _Journal of the American StatisticalAssociation_, 91: 222-230


Rich

Svend Juul wrote:

Cy wrote:

In a previous post, I indicated there was a drastic reduction in my

sub-population size. I traced the problem to a variable with a lot of
missing cases.

As you can see from the table below, this variable elicits whether the

respondent engaged in unprotected sexual intercourse. About a third of
the cases (33.78%) are missing.

V761 -- Last intercourse used condom

-----------------------------------------------------------
               |      Freq.    Percent      Valid       Cum.
---------------+--------------------------------------------
Valid   0 No   |       6012      56.16      84.81      84.81
        1 Yes  |       1075      10.04      15.16      99.97
        9      |          2       0.02       0.03     100.00
        Total  |       7089      66.22     100.00
Missing .      |       3617      33.78
Total          |      10706     100.00
-----------------------------------------------------------

Since the dependent variable in my deals with HIV risk, I need to

include sexual risk variables such as the V761 in the model.  How do I
deal with this missing data problem, so that it does not affect my
sample size. Would an imputation work?

==========================================================In this case, I would avoid imputation and instead generate two dummyvariables:

   V761_0 = 1 if no condom use, otherwise 0
   V761_miss = 1 if missing or 9, otherwise 0

. generate V761_0 = V761==0

    . generate V761_miss = V761>1
    . groups V761* , missing
      +--------------------------------------------+
      | V761   V761_0   V761_m~s   Freq.   Percent |
      |--------------------------------------------|
      |    0        1          0    6012     56.16 |
      |    1        0          0    1075     10.04 |
      |    9        0          1       2      0.02 |
      |    .        0          1    3617     33.78 |
      +--------------------------------------------+

-groups- is an unofficial command (ssc install groups).Both variables should be included in your regression. You will still

have a problem interpreting what missing means, but that problem
can not be solved by imputation.

Hope this helps

Svend

________________________________________________________Svend Juul

Institut for Folkesundhed, Afdeling for Epidemiologi
(School of Public Health, Department of Epidemiology)
Bartholins Allé 2

DK-8000 Aarhus C, DenmarkPhone: +45 8693 7796Mobile: +45 2634 7796E-mail: [email protected]_________________________________________________________

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Treatment for Missing Values - What Options ?
  - From: Chao Yawo <[email protected]>

References:
- Re: st: Treatment for Missing Values - What Options ?
  - From: Svend Juul <[email protected]>

Prev by Date: st: Acessing files in dir
Next by Date: Re: st: Overidentifying test
Previous by thread: Re: st: Treatment for Missing Values - What Options ?
Next by thread: Re: st: Treatment for Missing Values - What Options ?
Index(es):
- Date
- Thread