Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Treatment for Missing Values - What Options ?


From   Chao Yawo <[email protected]>
To   [email protected]
Subject   Re: st: Treatment for Missing Values - What Options ?
Date   Wed, 15 Jul 2009 08:59:20 -0400

Maarten,  thanks very much for your advice.

It could to me to check one more thing, that most of the people who
are missing on the condom use variable may be those who are not
sexually active or have not reached sexual debut.  so I created a new
variable for condom use, assigning a value of 2 to those who are
Missing (V6=761_Miss), and crosstabulated it the variable for those
who are Sexually Active (V531_R),  with the following results:

. tabulate V536_R V761_Miss, column

+-------------------+
| Key               |
|-------------------|
|     frequency     |
| column percentage |
+-------------------+

                  | RECODE of V761_R (RECODE of V761
   RECODE of V536 |  (Last intercourse used condom
   (Recent sexual |        (See also SMV761)
        activity) |  Not Used       Used    Missing |     Total
------------------+---------------------------------+----------
NotSexuallyActive |         0          0      2,146 |     2,146
                  |      0.00       0.00      59.40 |     20.06
------------------+---------------------------------+----------
   SexuallyActive |     6,012      1,075      1,467 |     8,554
                  |    100.00     100.00      40.60 |     79.94
------------------+---------------------------------+----------
            Total |     6,012      1,075      3,613 |    10,700
                  |    100.00     100.00     100.00 |    100.00


Given that close to 60% of those who are "Missing" on the condom use
variable are not sexually active, I decided to condition / subset the
check I did earlier for the relationship between the dependent
variable and the Missing variable on only those who are Sexually
Active, and got a different result from what I sent out yesterday:

. logit mis V781_R [pweight=weight], cluster(psu), if V536_R==1

(sum of wgt is   8.8262e+03)
Iteration 0:   log pseudolikelihood = -3739.3157
Iteration 1:   log pseudolikelihood = -3729.0254
Iteration 2:   log pseudolikelihood = -3728.8988
Iteration 3:   log pseudolikelihood = -3728.8988

Logistic regression                               Number of obs   =       8436
                                                  Wald chi2(1)    =       0.55
                                                  Prob > chi2     =     0.4590
Log pseudolikelihood = -3728.8988                 Pseudo R2       =     0.0028

                                  (Std. Err. adjusted for 357 clusters in psu)
------------------------------------------------------------------------------
             |               Robust
         mis |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      V781_R |    .383671   .5181314     0.74   0.459     -.631848     1.39919
       _cons |  -1.695747   .1268113   -13.37   0.000    -1.944293   -1.447202
------------------------------------------------------------------------------


Thus, if we take sexual activity only into consideration, the results
are non-significant.  Should I take this as evidence that the missing
variable does not severely impact my DV?

thx - cY



On Wed, Jul 15, 2009 at 3:11 AM, Maarten buis<[email protected]> wrote:
>
> --- On Tue, 14/7/09, Chao Yawo wrote:
>
>> Here is what i got when I run the
>> suggested code, with a slight
>> modification taking the survey design into account:
> <snip>
>> the significance mean then that the DV (V781_R) negatively
>> predicts the missing value (mis).  What does that mean? ...
>
> It means you are in trouble, or at least that the solution
> was not as easy as we thought. Since you have quite a larger
> proportion of missing cases and the probability of missingness
> is quite strongly related to your dependent variable (an odds
> ratio of exp(-.6133271)= .54), just ignoring these missing
> values will influence your results.
>
> I would do a mixture of approaches and hope they lead to
> similar conclusions.
>
> 1) I would do -ice-
>
> 2) You could use your faithfulness variable
>
> 3) use -ice- as in 1) but now estimate a model with multiple
> risky behavior variables, and combine their effects in a
> sheaf coefficient using -sheafcoef-. This way you could
> diminish the influence of the imputing that many values
> values on one of the indicator variables.
>
> You can download -sheafcoef- from SSC by typing in Stata
> -ssc install sheafcoef-. There is a more extensive
> discussion of this type of models in the helpfile of
> -propcnsreg-, which can be downloaded by typing
> -ssc install propcnsreg-. In order to use -sheafcoef-
> you will need to specify the -storebv- option in -mim-.
>
> However, I would not do the dummy variable approach, for reason
> already mentioned by Rich Goldstein and in an earlier post
> by me:
> http://www.stata.com/statalist/archive/2007-12/msg00030.html
>
> Hope this helps,
> Maarten
>
> -----------------------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://home.fsw.vu.nl/m.buis/
> -----------------------------------------
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index