Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: svylogitgof after logistic using subpop option


From   "Maria E. Montez Rath" <[email protected]>
To   [email protected]
Subject   Re: st: svylogitgof after logistic using subpop option
Date   Tue, 8 Mar 2011 16:05:49 -0800

Steve,

I got it now. I suppose that in the goodness of fit test all we need
are the predictions and so it doesn't matter that we are using a
conditional model.

Below are the results for the entire dataset. I don't know what to
make of the p-value=1.0 but that's another story I suppose. Also, a
lot of PSUs got dropped in the test part and so I don't know if we are
really testing the same model (although the estimates are the same).

I get the same results using both data sets.

Thanks for all your help.

Maria

. svy, subpop(if newpah==1): logistic dead i.aki2 i.diabetes i.mec_vent i.fem
(running logistic on estimation sample)

Survey: Logistic regression

Number of strata   =        58                  Number of obs      =   8104197
Number of PSUs     =      1027                  Population size    =  39615465
                                               Subpop. no. of obs =      1971
                                               Subpop. size       = 9686.4649
                                               Design df          =       969
                                               F(   4,    966)    =     27.18
                                               Prob > F           =    0.0000

------------------------------------------------------------------------------
            |             Linearized
       dead | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     1.aki2 |   3.511044   .8979891     4.91   0.000     2.125493    5.799799
 1.diabetes |   .4748568   .1459044    -2.42   0.016     .2598337    .8678202
 1.mec_vent |   9.576589   2.515918     8.60   0.000     5.718832    16.03668
      1.fem |    1.88229   .5211665     2.28   0.023     1.093231    3.240866
------------------------------------------------------------------------------
Note: 2 strata omitted because they contain no subpopulation members.

. // For the goodness of fit test, run :
. svy: logistic dead i.aki2 i.diabetes i.mec_vent i.fem if newpah==1
(running logistic on estimation sample)

Survey: Logistic regression

Number of strata   =        58                  Number of obs      =      1971
Number of PSUs     =       433                  Population size    = 9686.4649
                                               Design df          =       375
                                               F(   0,    375)    =         .
                                               Prob > F           =         .

------------------------------------------------------------------------------
            |             Linearized
       dead | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     1.aki2 |   3.511044          .        .       .            .           .
 1.diabetes |   .4748568          .        .       .            .           .
 1.mec_vent |   9.576589          .        .       .            .           .
      1.fem |    1.88229          .        .       .            .           .
------------------------------------------------------------------------------
Note: missing standard errors because of stratum with single sampling unit.

. estat gof

Logistic model for dead, goodness-of-fit test

                    F(9,367) =         0.00
                    Prob > F =         1.0000



On Tue, Mar 8, 2011 at 3:27 PM, Steven Samuels <[email protected]> wrote:
>
>
> Maria-
>
>
>
> You must use the -if- clause in the -svy logistic- statement. -estat gof-, even with an -if- clause, takes its degrees of freedom from the original logistic regression with the -subpop- statement.
>
> ***************************
> // For standard errors and tests, run:
> svy, subpop(if newpah==1): logistic dead i.aki2 i.diabetes i.mec_vent i.fem
> // For the goodness of fit test, run :
> svy: logistic dead i.aki2 i.diabetes i.mec_vent i.fem if newpah==1
> estat gof
> // estat gof if newpah==1 also works
> ***************************
>
> Note: you  seem to have switched subpopulations in your code, with "subpop(pah)" in the -svy: logistic- statement and  and "if newpah==1" in the -estat gof- statement. This might have led to unexpected results.
>
>
> Steve
>
> Steven J. Samuels
> Consulting Statistician
> 18 Cantine's Island
> Saugerties, NY 12477 USA
> Voice: 845-246-0774
> Fax:   206-202-4783
> [email protected]
>
>
>
>
> On Mar 8, 2011, at 4:45 PM, Maria E. Montez Rath wrote:
>
> Steve,
>
> thanks for pointing me to -estat gof-.
>
> I just found out that the -estat- Stata manual had been updated and
> now includes the goodness of fit test for binary data. I believe that
> -estat gof- is reporting the F-adjusted mean  residual test according
> to Archer and Lemeshow (2006).
>
> Reference
> Archer, K. J., and S. Lemeshow. 2006. Goodness-of-fit test for a
> logistic regression model fitted using survey sample data. Stata
> Journal 6: 97–105.
>
> But I still have a problem. I have 10 years of data and so I created a
> smaller dataset that includes my subpopulation augmented by one record
> for each PSU dropped when selecting the subpopulation. In theory this
> should work because the problem with selecting the subpopulation
> directly and doing a conditional analysis is that there is no way of
> the program to know how many PSUs were sampled. By augmenting my
> dataset with the PSUs dropped Stata can still compute n (total number
> of PSUs sampled).  I tested that this would work by comparing the
> results from -svy: logistic- with -subpop()- option using 1) the
> complete one year of data and 2) my augmented data for that same year.
>
> The results from -svy: logistic- are identical using both methods
> (Point estimates and SEs are equal) but the results from -estat gof-
> are very different where using the entire data the test indicates a
> lack of fit while using my augmented data the test indicates good fit.
>
> So, I'm still wondering how does -estat gof- uses the results from
> -svy: logistic- with the subpopulation option.
>
> Thank you,
>
> Maria
>
> Using ALL data:
>
> . use pah08
> . svy, subpop(pah): logistic dead i.aki2 i.diabetes i.mec_vent i.fem
> . estat gof if newpah==1
>
> Logistic model for dead, goodness-of-fit test
>
>                   F(9,961) =      3126.59
>                   Prob > F =         0.0000
>
> Using AUGMENTED data:
>
> . use pahsubpop08, clear
> . svy, subpop(pah): logistic died i.aki2 i.diabetes i.mec_vent i.fem
> . estat gof
> . estat gof
>
> Logistic model for died, goodness-of-fit test
>
>                   F(9,961) =         0.66
>                   Prob > F =         0.7500
>
> On Sun, Mar 6, 2011 at 5:41 PM, Steven Samuels <[email protected]> wrote:
>>
>> -
>>
>> -svylogitgof- is not official command. The Statalist FAQ request that you identify non-official commands as such and say where you got them.  -svylogit- is not "subpopulation-aware". To use use it for a subpopulation, you will have to first run of -svy logistic- with an -if- clause, not the subpop() option.
>>
>>
>> Steve
>> [email protected]
>>
>> On Mar 6, 2011, at 7:21 PM, Maria E. Montez Rath wrote:
>>
>> Hi!
>>
>> I'm using the NIS which follows a complex survey design to obtain the
>> odds of dying for patients with acute kidney disease in a
>> subpopulation. I'll be using 10 years of data which will make the
>> dataset too big. Since I'm interested in a subpopulation I found out
>> that in order to obtain correct standard errors, my dataset only needs
>> to include the subpopulation plus one record for each PSU that would
>> be dropped when creating the subpopulation dataset. This way, I can
>> still use the svy, subpop(): logistic command because Stata can still
>> compute the total number of hospitals sampled.
>>
>> While testing this theory I found that Stata will give me the same
>> results whether I use the entire sample or my augmented subpopulation
>> data but the goodness of fit test using svylogitgof is very different.
>> I also found that svylogitgof is reporting the number of observations
>> in the total sample and not the subpopulation number of observations.
>> Does this have any implication in the actual test?
>>
>> Below you can see the results from my test. First, is the output using
>> the entire dataset and second using my augmented subpopulation
>> dataset.
>>
>> The output from svy logistic is identical with the only difference
>> being the population size reported which is wrong on my augmented
>> dataset as it should be. However, all the results (ORs, SE, t,...) are
>> equal.
>>
>> The output for the goodness of fit test is very different. As you can
>> see, the number of observations reported are the total number of
>> observations in the data even though I'm doing a subpopulation
>> analysis. We see that the number of groups used is different and using
>> the entire dataset the test rejects the hypothesis of model is a good
>> fit, but using my augmented dataset we do not reject the hypothesis
>> that the model is a good fit. But they are the same model, so how can
>> I have such different analysis?
>>
>> I have read the paper on the test and I don't see where the number of
>> observations come into play. Also, in the paper it was assumed that
>> the number of groups used was 10 (generating deciles of risk). In the
>> new svylogitgof update, this was changed to vary.
>>
>> Can anyone help me? I don't know what to make of these results and I
>> surely cannot use them as I don't think the test applied to the entire
>> dataset is also correct.
>>
>> Thank you,
>>
>> Maria
>>
>> Using ALL data:
>>
>> . svy, subpop(pah): logistic dead i.diabetes i.aki2 i.mec_vent i.fem
>> Survey: Logistic regression
>>
>> Number of strata   =        58                  Number of obs       =   8104197
>> Number of PSUs     =      1027              Population size      =  39615465
>>                                                          Subpop. no.
>> of obs =         1971
>>                                                          Subpop. size
>>        = 9686.4649
>>                                                          Design df
>>         =           969
>>                                                         F(   4,
>> 966)        =         27.18
>>                                                         Prob > F
>>        =       0.0000
>>
>> -------------------------------------------------------------------------------
>>                 |             Linearized
>>         dead | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
>> -------------+----------------------------------------------------------------
>>       1.aki2 |   3.511044   .8979891     4.91   0.000     2.125493    5.799799
>>  1.diabetes |   .4748568   .1459044    -2.42   0.016     .2598337    .8678202
>> 1.mec_vent |   9.576589   2.515918     8.60   0.000     5.718832    16.03668
>>        1.fem |    1.88229   .5211665     2.28   0.023     1.093231    3.240866
>> ------------------------------------------------------------------------------
>> Note: 2 strata omitted because they contain no subpopulation members.
>>
>> . svylogitgof
>>  Number of observations =                         8104197
>>  F-adjusted test statistic = F(3,967) =       7865.271
>>                       Prob > F =                             0.000
>>
>>
>> Using AUGMENTED subpopulation data:
>>
>> . svy, subpop(pah): logistic died i.aki2 i.diabetes i.mec_vent i.fem
>> Survey: Logistic regression
>>
>> Number of strata   =        58                 Number of obs       =
>>     2565
>> Number of PSUs     =      1027             Population size      = 12682.585
>>                                                        Subpop. no.
>> of obs =         1971
>>                                                        Subpop. size
>>      = 9686.4649
>>                                                        Design df
>>        =           969
>>                                                        F(   4,
>> 966)        =         27.18
>>                                                        Prob > F
>>       =       0.0000
>>
>> ------------------------------------------------------------------------------
>>                |                 Linearized
>>          died | Odds Ratio   Std. Err.      t    P>|t|     [95%
>> Conf. Interval]
>>   -------------+----------------------------------------------------------------
>>    1.aki2    |   3.511044   .8979891     4.91   0.000     2.125493    5.799799
>> 1.diabetes  |   .4748568   .1459044    -2.42   0.016     .2598337    .8678202
>> 1.mec_vent |   9.576589   2.515918     8.60   0.000     5.718832    16.03668
>>  1.female   |    1.88229   .5211665     2.28   0.023     1.093231    3.240866
>> ------------------------------------------------------------------------------
>> Note: 2 strata omitted because they contain no subpopulation members.
>>
>> . svylogitgof
>>  Number of observations =                            2565
>>  F-adjusted test statistic  = F(5,965) =          1.096
>>                       Prob > F =                           0.361
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index