Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: svylogitgof after logistic using subpop option


From   "Michael I. Lichter" <[email protected]>
To   [email protected]
Subject   Re: st: svylogitgof after logistic using subpop option
Date   Mon, 07 Mar 2011 16:05:03 -0500

I have done some work on svylogitgof and will take a look at this problem when I get a chance. I'm not certain what I need to do to make it "subpopulation aware", but I'll give it a shot. It obviously shouldn't fail when confronted with an "if", in any event.

-ml

On 3/7/2011 3:55 PM, Steven Samuels wrote:
Maria-

I have found that -estat gof- works after -svy: logit-  with an -if- clause, whereas -svylogitgof- sometimes fails.

*********************
sysuse auto, clear
svyset _n
svy, subpop(if price<7000): logistic foreign turn
svy: logistic foreign turn if price<7000
svylogitgof  // throws error
estat gof    // works with correct d.f.
***********************

Steve
[email protected]

-svylogitgof- is not official command. The Statalist FAQ request that you identify non-official commands as such and say where you got them.  -svylogit- is not "subpopulation-aware". To use use it for a subpopulation, you will have to first run of -svy logistic- with an -if- clause, not the subpop() option.


Steve
[email protected]

On Mar 6, 2011, at 7:21 PM, Maria E. Montez Rath wrote:

Hi!

I'm using the NIS which follows a complex survey design to obtain the
odds of dying for patients with acute kidney disease in a
subpopulation. I'll be using 10 years of data which will make the
dataset too big. Since I'm interested in a subpopulation I found out
that in order to obtain correct standard errors, my dataset only needs
to include the subpopulation plus one record for each PSU that would
be dropped when creating the subpopulation dataset. This way, I can
still use the svy, subpop(): logistic command because Stata can still
compute the total number of hospitals sampled.

While testing this theory I found that Stata will give me the same
results whether I use the entire sample or my augmented subpopulation
data but the goodness of fit test using svylogitgof is very different.
I also found that svylogitgof is reporting the number of observations
in the total sample and not the subpopulation number of observations.
Does this have any implication in the actual test?

Below you can see the results from my test. First, is the output using
the entire dataset and second using my augmented subpopulation
dataset.

The output from svy logistic is identical with the only difference
being the population size reported which is wrong on my augmented
dataset as it should be. However, all the results (ORs, SE, t,...) are
equal.

The output for the goodness of fit test is very different. As you can
see, the number of observations reported are the total number of
observations in the data even though I'm doing a subpopulation
analysis. We see that the number of groups used is different and using
the entire dataset the test rejects the hypothesis of model is a good
fit, but using my augmented dataset we do not reject the hypothesis
that the model is a good fit. But they are the same model, so how can
I have such different analysis?

I have read the paper on the test and I don't see where the number of
observations come into play. Also, in the paper it was assumed that
the number of groups used was 10 (generating deciles of risk). In the
new svylogitgof update, this was changed to vary.

Can anyone help me? I don't know what to make of these results and I
surely cannot use them as I don't think the test applied to the entire
dataset is also correct.

Thank you,

Maria

Using ALL data:

. svy, subpop(pah): logistic dead i.diabetes i.aki2 i.mec_vent i.fem
Survey: Logistic regression

Number of strata   =        58                  Number of obs       =   8104197
Number of PSUs     =      1027              Population size      =  39615465
                                                          Subpop. no.
of obs =         1971
                                                          Subpop. size
        = 9686.4649
                                                          Design df
         =           969
                                                         F(   4,
966)        =         27.18
                                                         Prob>  F
        =       0.0000

-------------------------------------------------------------------------------
                 |             Linearized
         dead | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       1.aki2 |   3.511044   .8979891     4.91   0.000     2.125493    5.799799
1.diabetes |   .4748568   .1459044    -2.42   0.016     .2598337    .8678202
1.mec_vent |   9.576589   2.515918     8.60   0.000     5.718832    16.03668
        1.fem |    1.88229   .5211665     2.28   0.023     1.093231    3.240866
------------------------------------------------------------------------------
Note: 2 strata omitted because they contain no subpopulation members.

. svylogitgof
  Number of observations =                         8104197
  F-adjusted test statistic = F(3,967) =       7865.271
                       Prob>  F =                             0.000


Using AUGMENTED subpopulation data:

. svy, subpop(pah): logistic died i.aki2 i.diabetes i.mec_vent i.fem
Survey: Logistic regression

Number of strata   =        58                 Number of obs       =
     2565
Number of PSUs     =      1027             Population size      = 12682.585
                                                        Subpop. no.
of obs =         1971
                                                        Subpop. size
      = 9686.4649
                                                        Design df
        =           969
                                                        F(   4,
966)        =         27.18
                                                        Prob>  F
       =       0.0000

------------------------------------------------------------------------------
                |                 Linearized
          died | Odds Ratio   Std. Err.      t    P>|t|     [95%
Conf. Interval]
   -------------+----------------------------------------------------------------
    1.aki2    |   3.511044   .8979891     4.91   0.000     2.125493    5.799799
1.diabetes  |   .4748568   .1459044    -2.42   0.016     .2598337    .8678202
1.mec_vent |   9.576589   2.515918     8.60   0.000     5.718832    16.03668
1.female   |    1.88229   .5211665     2.28   0.023     1.093231    3.240866
------------------------------------------------------------------------------
Note: 2 strata omitted because they contain no subpopulation members.

. svylogitgof
  Number of observations =                            2565
F-adjusted test statistic  = F(5,965) =          1.096
                       Prob>  F =                           0.361

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index