Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: svylogitgof after logistic using subpop option


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: svylogitgof after logistic using subpop option
Date   Mon, 7 Mar 2011 21:43:39 +0000

-svylogitgof- seems to expect an e(wexp) which -svy: logistic- never emits. When -svylogitgof- calls up -xtile- with pweights that are missing, that stops the program. 

(Just reading what it says in the code....) 

Nick 
[email protected] 

Michael I. Lichter

I have done some work on svylogitgof and will take a look at this 
problem when I get a chance. I'm not certain what I need to do to make 
it "subpopulation aware", but I'll give it a shot. It obviously 
shouldn't fail when confronted with an "if", in any event.


On 3/7/2011 3:55 PM, Steven Samuels wrote:

> I have found that -estat gof- works after -svy: logit-  with an -if- clause, whereas -svylogitgof- sometimes fails.
>
> *********************
> sysuse auto, clear
> svyset _n
> svy, subpop(if price<7000): logistic foreign turn
> svy: logistic foreign turn if price<7000
> svylogitgof  // throws error
> estat gof    // works with correct d.f.
> ***********************
>

> -svylogitgof- is not official command. The Statalist FAQ request that you identify non-official commands as such and say where you got them.  -svylogit- is not "subpopulation-aware". To use use it for a subpopulation, you will have to first run of -svy logistic- with an -if- clause, not the subpop() option.
>
 On Mar 6, 2011, at 7:21 PM, Maria E. Montez Rath wrote:

> I'm using the NIS which follows a complex survey design to obtain the
> odds of dying for patients with acute kidney disease in a
> subpopulation. I'll be using 10 years of data which will make the
> dataset too big. Since I'm interested in a subpopulation I found out
> that in order to obtain correct standard errors, my dataset only needs
> to include the subpopulation plus one record for each PSU that would
> be dropped when creating the subpopulation dataset. This way, I can
> still use the svy, subpop(): logistic command because Stata can still
> compute the total number of hospitals sampled.
>
> While testing this theory I found that Stata will give me the same
> results whether I use the entire sample or my augmented subpopulation
> data but the goodness of fit test using svylogitgof is very different.
> I also found that svylogitgof is reporting the number of observations
> in the total sample and not the subpopulation number of observations.
> Does this have any implication in the actual test?
>
> Below you can see the results from my test. First, is the output using
> the entire dataset and second using my augmented subpopulation
> dataset.
>
> The output from svy logistic is identical with the only difference
> being the population size reported which is wrong on my augmented
> dataset as it should be. However, all the results (ORs, SE, t,...) are
> equal.
>
> The output for the goodness of fit test is very different. As you can
> see, the number of observations reported are the total number of
> observations in the data even though I'm doing a subpopulation
> analysis. We see that the number of groups used is different and using
> the entire dataset the test rejects the hypothesis of model is a good
> fit, but using my augmented dataset we do not reject the hypothesis
> that the model is a good fit. But they are the same model, so how can
> I have such different analysis?
>
> I have read the paper on the test and I don't see where the number of
> observations come into play. Also, in the paper it was assumed that
> the number of groups used was 10 (generating deciles of risk). In the
> new svylogitgof update, this was changed to vary.
>
> Can anyone help me? I don't know what to make of these results and I
> surely cannot use them as I don't think the test applied to the entire
> dataset is also correct.
>
> Thank you,
>
> Maria
>
> Using ALL data:
>
> . svy, subpop(pah): logistic dead i.diabetes i.aki2 i.mec_vent i.fem
> Survey: Logistic regression
>
> Number of strata   =        58                  Number of obs       =   8104197
> Number of PSUs     =      1027              Population size      =  39615465
>                                                           Subpop. no.
> of obs =         1971
>                                                           Subpop. size
>         = 9686.4649
>                                                           Design df
>          =           969
>                                                          F(   4,
> 966)        =         27.18
>                                                          Prob>  F
>         =       0.0000
>
> -------------------------------------------------------------------------------
>                  |             Linearized
>          dead | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>        1.aki2 |   3.511044   .8979891     4.91   0.000     2.125493    5.799799
> 1.diabetes |   .4748568   .1459044    -2.42   0.016     .2598337    .8678202
> 1.mec_vent |   9.576589   2.515918     8.60   0.000     5.718832    16.03668
>         1.fem |    1.88229   .5211665     2.28   0.023     1.093231    3.240866
> ------------------------------------------------------------------------------
> Note: 2 strata omitted because they contain no subpopulation members.
>
> . svylogitgof
>   Number of observations =                         8104197
>   F-adjusted test statistic = F(3,967) =       7865.271
>                        Prob>  F =                             0.000
>
>
> Using AUGMENTED subpopulation data:
>
> . svy, subpop(pah): logistic died i.aki2 i.diabetes i.mec_vent i.fem
> Survey: Logistic regression
>
> Number of strata   =        58                 Number of obs       =
>      2565
> Number of PSUs     =      1027             Population size      = 12682.585
>                                                         Subpop. no.
> of obs =         1971
>                                                         Subpop. size
>       = 9686.4649
>                                                         Design df
>         =           969
>                                                         F(   4,
> 966)        =         27.18
>                                                         Prob>  F
>        =       0.0000
>
> ------------------------------------------------------------------------------
>                 |                 Linearized
>           died | Odds Ratio   Std. Err.      t    P>|t|     [95%
> Conf. Interval]
>    -------------+----------------------------------------------------------------
>     1.aki2    |   3.511044   .8979891     4.91   0.000     2.125493    5.799799
> 1.diabetes  |   .4748568   .1459044    -2.42   0.016     .2598337    .8678202
> 1.mec_vent |   9.576589   2.515918     8.60   0.000     5.718832    16.03668
> 1.female   |    1.88229   .5211665     2.28   0.023     1.093231    3.240866
> ------------------------------------------------------------------------------
> Note: 2 strata omitted because they contain no subpopulation members.
>
> . svylogitgof
>   Number of observations =                            2565
> F-adjusted test statistic  = F(5,965) =          1.096
>                        Prob>  F =                           0.361
>
> *

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index