Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: svylogitgof after logistic using subpop option
From
Nick Cox <[email protected]>
To
"'[email protected]'" <[email protected]>
Subject
RE: st: svylogitgof after logistic using subpop option
Date
Mon, 7 Mar 2011 21:43:39 +0000
-svylogitgof- seems to expect an e(wexp) which -svy: logistic- never emits. When -svylogitgof- calls up -xtile- with pweights that are missing, that stops the program.
(Just reading what it says in the code....)
Nick
[email protected]
Michael I. Lichter
I have done some work on svylogitgof and will take a look at this
problem when I get a chance. I'm not certain what I need to do to make
it "subpopulation aware", but I'll give it a shot. It obviously
shouldn't fail when confronted with an "if", in any event.
On 3/7/2011 3:55 PM, Steven Samuels wrote:
> I have found that -estat gof- works after -svy: logit- with an -if- clause, whereas -svylogitgof- sometimes fails.
>
> *********************
> sysuse auto, clear
> svyset _n
> svy, subpop(if price<7000): logistic foreign turn
> svy: logistic foreign turn if price<7000
> svylogitgof // throws error
> estat gof // works with correct d.f.
> ***********************
>
> -svylogitgof- is not official command. The Statalist FAQ request that you identify non-official commands as such and say where you got them. -svylogit- is not "subpopulation-aware". To use use it for a subpopulation, you will have to first run of -svy logistic- with an -if- clause, not the subpop() option.
>
On Mar 6, 2011, at 7:21 PM, Maria E. Montez Rath wrote:
> I'm using the NIS which follows a complex survey design to obtain the
> odds of dying for patients with acute kidney disease in a
> subpopulation. I'll be using 10 years of data which will make the
> dataset too big. Since I'm interested in a subpopulation I found out
> that in order to obtain correct standard errors, my dataset only needs
> to include the subpopulation plus one record for each PSU that would
> be dropped when creating the subpopulation dataset. This way, I can
> still use the svy, subpop(): logistic command because Stata can still
> compute the total number of hospitals sampled.
>
> While testing this theory I found that Stata will give me the same
> results whether I use the entire sample or my augmented subpopulation
> data but the goodness of fit test using svylogitgof is very different.
> I also found that svylogitgof is reporting the number of observations
> in the total sample and not the subpopulation number of observations.
> Does this have any implication in the actual test?
>
> Below you can see the results from my test. First, is the output using
> the entire dataset and second using my augmented subpopulation
> dataset.
>
> The output from svy logistic is identical with the only difference
> being the population size reported which is wrong on my augmented
> dataset as it should be. However, all the results (ORs, SE, t,...) are
> equal.
>
> The output for the goodness of fit test is very different. As you can
> see, the number of observations reported are the total number of
> observations in the data even though I'm doing a subpopulation
> analysis. We see that the number of groups used is different and using
> the entire dataset the test rejects the hypothesis of model is a good
> fit, but using my augmented dataset we do not reject the hypothesis
> that the model is a good fit. But they are the same model, so how can
> I have such different analysis?
>
> I have read the paper on the test and I don't see where the number of
> observations come into play. Also, in the paper it was assumed that
> the number of groups used was 10 (generating deciles of risk). In the
> new svylogitgof update, this was changed to vary.
>
> Can anyone help me? I don't know what to make of these results and I
> surely cannot use them as I don't think the test applied to the entire
> dataset is also correct.
>
> Thank you,
>
> Maria
>
> Using ALL data:
>
> . svy, subpop(pah): logistic dead i.diabetes i.aki2 i.mec_vent i.fem
> Survey: Logistic regression
>
> Number of strata = 58 Number of obs = 8104197
> Number of PSUs = 1027 Population size = 39615465
> Subpop. no.
> of obs = 1971
> Subpop. size
> = 9686.4649
> Design df
> = 969
> F( 4,
> 966) = 27.18
> Prob> F
> = 0.0000
>
> -------------------------------------------------------------------------------
> | Linearized
> dead | Odds Ratio Std. Err. t P>|t| [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> 1.aki2 | 3.511044 .8979891 4.91 0.000 2.125493 5.799799
> 1.diabetes | .4748568 .1459044 -2.42 0.016 .2598337 .8678202
> 1.mec_vent | 9.576589 2.515918 8.60 0.000 5.718832 16.03668
> 1.fem | 1.88229 .5211665 2.28 0.023 1.093231 3.240866
> ------------------------------------------------------------------------------
> Note: 2 strata omitted because they contain no subpopulation members.
>
> . svylogitgof
> Number of observations = 8104197
> F-adjusted test statistic = F(3,967) = 7865.271
> Prob> F = 0.000
>
>
> Using AUGMENTED subpopulation data:
>
> . svy, subpop(pah): logistic died i.aki2 i.diabetes i.mec_vent i.fem
> Survey: Logistic regression
>
> Number of strata = 58 Number of obs =
> 2565
> Number of PSUs = 1027 Population size = 12682.585
> Subpop. no.
> of obs = 1971
> Subpop. size
> = 9686.4649
> Design df
> = 969
> F( 4,
> 966) = 27.18
> Prob> F
> = 0.0000
>
> ------------------------------------------------------------------------------
> | Linearized
> died | Odds Ratio Std. Err. t P>|t| [95%
> Conf. Interval]
> -------------+----------------------------------------------------------------
> 1.aki2 | 3.511044 .8979891 4.91 0.000 2.125493 5.799799
> 1.diabetes | .4748568 .1459044 -2.42 0.016 .2598337 .8678202
> 1.mec_vent | 9.576589 2.515918 8.60 0.000 5.718832 16.03668
> 1.female | 1.88229 .5211665 2.28 0.023 1.093231 3.240866
> ------------------------------------------------------------------------------
> Note: 2 strata omitted because they contain no subpopulation members.
>
> . svylogitgof
> Number of observations = 2565
> F-adjusted test statistic = F(5,965) = 1.096
> Prob> F = 0.361
>
> *
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/