Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: svylogitgof after logistic using subpop option


From   "Maria E. Montez Rath" <maria.rath@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: svylogitgof after logistic using subpop option
Date   Tue, 8 Mar 2011 13:45:13 -0800

Steve,

thanks for pointing me to -estat gof-.

I just found out that the -estat- Stata manual had been updated and
now includes the goodness of fit test for binary data. I believe that
-estat gof- is reporting the F-adjusted mean  residual test according
to Archer and Lemeshow (2006).

Reference
Archer, K. J., and S. Lemeshow. 2006. Goodness-of-fit test for a
logistic regression model fitted using survey sample data. Stata
Journal 6: 97–105.

But I still have a problem. I have 10 years of data and so I created a
smaller dataset that includes my subpopulation augmented by one record
for each PSU dropped when selecting the subpopulation. In theory this
should work because the problem with selecting the subpopulation
directly and doing a conditional analysis is that there is no way of
the program to know how many PSUs were sampled. By augmenting my
dataset with the PSUs dropped Stata can still compute n (total number
of PSUs sampled).  I tested that this would work by comparing the
results from -svy: logistic- with -subpop()- option using 1) the
complete one year of data and 2) my augmented data for that same year.

The results from -svy: logistic- are identical using both methods
(Point estimates and SEs are equal) but the results from -estat gof-
are very different where using the entire data the test indicates a
lack of fit while using my augmented data the test indicates good fit.

So, I'm still wondering how does -estat gof- uses the results from
-svy: logistic- with the subpopulation option.

Thank you,

Maria

Using ALL data:

. use pah08
. svy, subpop(pah): logistic dead i.aki2 i.diabetes i.mec_vent i.fem
. estat gof if newpah==1

Logistic model for dead, goodness-of-fit test

                   F(9,961) =      3126.59
                   Prob > F =         0.0000

Using AUGMENTED data:

. use pahsubpop08, clear
. svy, subpop(pah): logistic died i.aki2 i.diabetes i.mec_vent i.fem
. estat gof
. estat gof

Logistic model for died, goodness-of-fit test

                   F(9,961) =         0.66
                   Prob > F =         0.7500

On Sun, Mar 6, 2011 at 5:41 PM, Steven Samuels <sjsamuels@gmail.com> wrote:
>
> -
>
> -svylogitgof- is not official command. The Statalist FAQ request that you identify non-official commands as such and say where you got them.  -svylogit- is not "subpopulation-aware". To use use it for a subpopulation, you will have to first run of -svy logistic- with an -if- clause, not the subpop() option.
>
>
> Steve
> sjsamuels@gmail.com
>
> On Mar 6, 2011, at 7:21 PM, Maria E. Montez Rath wrote:
>
> Hi!
>
> I'm using the NIS which follows a complex survey design to obtain the
> odds of dying for patients with acute kidney disease in a
> subpopulation. I'll be using 10 years of data which will make the
> dataset too big. Since I'm interested in a subpopulation I found out
> that in order to obtain correct standard errors, my dataset only needs
> to include the subpopulation plus one record for each PSU that would
> be dropped when creating the subpopulation dataset. This way, I can
> still use the svy, subpop(): logistic command because Stata can still
> compute the total number of hospitals sampled.
>
> While testing this theory I found that Stata will give me the same
> results whether I use the entire sample or my augmented subpopulation
> data but the goodness of fit test using svylogitgof is very different.
> I also found that svylogitgof is reporting the number of observations
> in the total sample and not the subpopulation number of observations.
> Does this have any implication in the actual test?
>
> Below you can see the results from my test. First, is the output using
> the entire dataset and second using my augmented subpopulation
> dataset.
>
> The output from svy logistic is identical with the only difference
> being the population size reported which is wrong on my augmented
> dataset as it should be. However, all the results (ORs, SE, t,...) are
> equal.
>
> The output for the goodness of fit test is very different. As you can
> see, the number of observations reported are the total number of
> observations in the data even though I'm doing a subpopulation
> analysis. We see that the number of groups used is different and using
> the entire dataset the test rejects the hypothesis of model is a good
> fit, but using my augmented dataset we do not reject the hypothesis
> that the model is a good fit. But they are the same model, so how can
> I have such different analysis?
>
> I have read the paper on the test and I don't see where the number of
> observations come into play. Also, in the paper it was assumed that
> the number of groups used was 10 (generating deciles of risk). In the
> new svylogitgof update, this was changed to vary.
>
> Can anyone help me? I don't know what to make of these results and I
> surely cannot use them as I don't think the test applied to the entire
> dataset is also correct.
>
> Thank you,
>
> Maria
>
> Using ALL data:
>
> . svy, subpop(pah): logistic dead i.diabetes i.aki2 i.mec_vent i.fem
> Survey: Logistic regression
>
> Number of strata   =        58                  Number of obs       =   8104197
> Number of PSUs     =      1027              Population size      =  39615465
>                                                          Subpop. no.
> of obs =         1971
>                                                          Subpop. size
>        = 9686.4649
>                                                          Design df
>         =           969
>                                                         F(   4,
> 966)        =         27.18
>                                                         Prob > F
>        =       0.0000
>
> -------------------------------------------------------------------------------
>                 |             Linearized
>         dead | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>       1.aki2 |   3.511044   .8979891     4.91   0.000     2.125493    5.799799
>  1.diabetes |   .4748568   .1459044    -2.42   0.016     .2598337    .8678202
> 1.mec_vent |   9.576589   2.515918     8.60   0.000     5.718832    16.03668
>        1.fem |    1.88229   .5211665     2.28   0.023     1.093231    3.240866
> ------------------------------------------------------------------------------
> Note: 2 strata omitted because they contain no subpopulation members.
>
> . svylogitgof
>  Number of observations =                         8104197
>  F-adjusted test statistic = F(3,967) =       7865.271
>                       Prob > F =                             0.000
>
>
> Using AUGMENTED subpopulation data:
>
> . svy, subpop(pah): logistic died i.aki2 i.diabetes i.mec_vent i.fem
> Survey: Logistic regression
>
> Number of strata   =        58                 Number of obs       =
>     2565
> Number of PSUs     =      1027             Population size      = 12682.585
>                                                        Subpop. no.
> of obs =         1971
>                                                        Subpop. size
>      = 9686.4649
>                                                        Design df
>        =           969
>                                                        F(   4,
> 966)        =         27.18
>                                                        Prob > F
>       =       0.0000
>
> ------------------------------------------------------------------------------
>                |                 Linearized
>          died | Odds Ratio   Std. Err.      t    P>|t|     [95%
> Conf. Interval]
>   -------------+----------------------------------------------------------------
>    1.aki2    |   3.511044   .8979891     4.91   0.000     2.125493    5.799799
> 1.diabetes  |   .4748568   .1459044    -2.42   0.016     .2598337    .8678202
> 1.mec_vent |   9.576589   2.515918     8.60   0.000     5.718832    16.03668
>  1.female   |    1.88229   .5211665     2.28   0.023     1.093231    3.240866
> ------------------------------------------------------------------------------
> Note: 2 strata omitted because they contain no subpopulation members.
>
> . svylogitgof
>  Number of observations =                            2565
>  F-adjusted test statistic  = F(5,965) =          1.096
>                       Prob > F =                           0.361
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index