Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: svylogitgof after logistic using subpop option
From
"Maria E. Montez Rath" <[email protected]>
To
[email protected]
Subject
Re: st: svylogitgof after logistic using subpop option
Date
Sun, 6 Mar 2011 22:20:51 -0800
Thank you for your response. I've read the FAQ but missed that
important detail. Sorry.
Maria
On Sun, Mar 6, 2011 at 5:41 PM, Steven Samuels <[email protected]> wrote:
> -
>
> -svylogitgof- is not official command. The Statalist FAQ request that you identify non-official commands as such and say where you got them. -svylogit- is not "subpopulation-aware". To use use it for a subpopulation, you will have to first run of -svy logistic- with an -if- clause, not the subpop() option.
>
>
> Steve
> [email protected]
>
> On Mar 6, 2011, at 7:21 PM, Maria E. Montez Rath wrote:
>
> Hi!
>
> I'm using the NIS which follows a complex survey design to obtain the
> odds of dying for patients with acute kidney disease in a
> subpopulation. I'll be using 10 years of data which will make the
> dataset too big. Since I'm interested in a subpopulation I found out
> that in order to obtain correct standard errors, my dataset only needs
> to include the subpopulation plus one record for each PSU that would
> be dropped when creating the subpopulation dataset. This way, I can
> still use the svy, subpop(): logistic command because Stata can still
> compute the total number of hospitals sampled.
>
> While testing this theory I found that Stata will give me the same
> results whether I use the entire sample or my augmented subpopulation
> data but the goodness of fit test using svylogitgof is very different.
> I also found that svylogitgof is reporting the number of observations
> in the total sample and not the subpopulation number of observations.
> Does this have any implication in the actual test?
>
> Below you can see the results from my test. First, is the output using
> the entire dataset and second using my augmented subpopulation
> dataset.
>
> The output from svy logistic is identical with the only difference
> being the population size reported which is wrong on my augmented
> dataset as it should be. However, all the results (ORs, SE, t,...) are
> equal.
>
> The output for the goodness of fit test is very different. As you can
> see, the number of observations reported are the total number of
> observations in the data even though I'm doing a subpopulation
> analysis. We see that the number of groups used is different and using
> the entire dataset the test rejects the hypothesis of model is a good
> fit, but using my augmented dataset we do not reject the hypothesis
> that the model is a good fit. But they are the same model, so how can
> I have such different analysis?
>
> I have read the paper on the test and I don't see where the number of
> observations come into play. Also, in the paper it was assumed that
> the number of groups used was 10 (generating deciles of risk). In the
> new svylogitgof update, this was changed to vary.
>
> Can anyone help me? I don't know what to make of these results and I
> surely cannot use them as I don't think the test applied to the entire
> dataset is also correct.
>
> Thank you,
>
> Maria
>
> Using ALL data:
>
> . svy, subpop(pah): logistic dead i.diabetes i.aki2 i.mec_vent i.fem
> Survey: Logistic regression
>
> Number of strata = 58 Number of obs = 8104197
> Number of PSUs = 1027 Population size = 39615465
> Subpop. no.
> of obs = 1971
> Subpop. size
> = 9686.4649
> Design df
> = 969
> F( 4,
> 966) = 27.18
> Prob > F
> = 0.0000
>
> -------------------------------------------------------------------------------
> | Linearized
> dead | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> 1.aki2 | 3.511044 .8979891 4.91 0.000 2.125493 5.799799
> 1.diabetes | .4748568 .1459044 -2.42 0.016 .2598337 .8678202
> 1.mec_vent | 9.576589 2.515918 8.60 0.000 5.718832 16.03668
> 1.fem | 1.88229 .5211665 2.28 0.023 1.093231 3.240866
> ------------------------------------------------------------------------------
> Note: 2 strata omitted because they contain no subpopulation members.
>
> . svylogitgof
> Number of observations = 8104197
> F-adjusted test statistic = F(3,967) = 7865.271
> Prob > F = 0.000
>
>
> Using AUGMENTED subpopulation data:
>
> . svy, subpop(pah): logistic died i.aki2 i.diabetes i.mec_vent i.fem
> Survey: Logistic regression
>
> Number of strata = 58 Number of obs =
> 2565
> Number of PSUs = 1027 Population size = 12682.585
> Subpop. no.
> of obs = 1971
> Subpop. size
> = 9686.4649
> Design df
> = 969
> F( 4,
> 966) = 27.18
> Prob > F
> = 0.0000
>
> ------------------------------------------------------------------------------
> | Linearized
> died | Odds Ratio Std. Err. t P>|t| [95%
> Conf. Interval]
> -------------+----------------------------------------------------------------
> 1.aki2 | 3.511044 .8979891 4.91 0.000 2.125493 5.799799
> 1.diabetes | .4748568 .1459044 -2.42 0.016 .2598337 .8678202
> 1.mec_vent | 9.576589 2.515918 8.60 0.000 5.718832 16.03668
> 1.female | 1.88229 .5211665 2.28 0.023 1.093231 3.240866
> ------------------------------------------------------------------------------
> Note: 2 strata omitted because they contain no subpopulation members.
>
> . svylogitgof
> Number of observations = 2565
> F-adjusted test statistic = F(5,965) = 1.096
> Prob > F = 0.361
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/