Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: estat gof (Hosmer & Lemeshow) after svy:logistic (survey)


From   Ángel Rodríguez Laso <[email protected]>
To   [email protected]
Subject   Re: st: estat gof (Hosmer & Lemeshow) after svy:logistic (survey)
Date   Mon, 22 Jul 2013 11:21:06 +0200

Dear Steve & Tim,

Tim is right. The 12 version manual states that estat gof is not
appropiate after svy. I was told that it was in this Stata version.

In this case I did  as the archives Steve recommended say: I used if
instead of subpop. The problem is that estat gof results are very
different if I use svy and I don't use it, even when the models are
not that different.

Maybe the reason is that using estat gof after svy is not correct.
Would it be a correct alternative to check for the goodness of fit
after svy this procedure from Korn & Graubard, Analysis of Health
Surveys 1999 John Wiley& Sons New York, p 106:

svy: logistic

predict p

xtile decile = p [pweight=w], nq(10)

bysort decile: egen sumw=sum(w)

gen pw=p*w

bysort decile: egen sumpw=sum(pw)

gen meanpw=sumpw/sumw

gen ow=vardep*w

bysort decile: egen sumow=sum(ow)

gen meanow=sumow/sumw

gen difmean=meanpw-meanow

bysort decile: gen percentil=_n

list meanpw meanow difmean if percentil==1



Thank you very much.

Angel Rodriguez-Laso

2013/7/18 Steve Samuels <[email protected]>:
> See: http://www.stata.com/statalist/archive/2011-03/msg00550.html
>
> Steve
> [email protected]
>
> On Jul 17, 2013, at 5:23 AM, Ángel Rodríguez Laso wrote:
>
> Dear Statalisters,
>
> Working with Stata 12.1.
>
>
> If I carry out the following logistic regression in a survey setting
> and then type estat gof I get:
>
>
> . svy, subpop(if disdesjub==1 & disdestr==1 & trab==1 & dismy50==1 &
> proxy==2 & edad_c>=60): logistic discAVD edad_c i.sexo i. estud4
> i.difinmes3
> (running logistic on estimation sample)
>
> Survey: Logistic regression
>
> Number of strata   =        41                  Number of obs      =      1727
> Number of PSUs     =       234                  Population size    = 1347,0862
>                                                Subpop. no. of obs =       710
>                                                Subpop. size       =    563,75
>                                                Design df          =       193
>                                                F(   7,    187)    =      8,32
>                                                Prob > F           =    0,0000
>
> ------------------------------------------------------------------------------
>             |             Linearized
>     discAVD | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>      edad_c |       1,10       0,02     4,42   0,000         1,05        1,15
>             |
>        sexo |
>          1  |       1,00  (base)
>          2  |       2,60       0,82     3,02   0,003         1,39        4,84
>             |
>      estud4 |
>          0  |       1,00  (base)
>          1  |       0,87       0,32    -0,38   0,704         0,43        1,78
>          2  |       0,90       0,40    -0,24   0,807         0,37        2,16
>          3  |       0,60       0,27    -1,14   0,257         0,24        1,47
>             |
>   difinmes3 |
>          0  |       1,00  (base)
>          1  |       1,59       0,57     1,31   0,190         0,79        3,21
>          2  |       3,33       1,20     3,35   0,001         1,64        6,77
>             |
>       _cons |       0,00       0,00    -5,88   0,000         0,00        0,00
> ------------------------------------------------------------------------------
>
> .
> end of do-file
>
> . estat gof
> estat gof is not allowed after subpopulation estimations
> r(198);
>
>
>
> Then I change if statements for my subpopulation especifications:
>
>
> . svy: logistic discAVD edad_c i.sexo i.estud4 i.difinmes3 if
> disdesjub==1 & disdestr==1 & trab==1 & dismy50==1 & proxy==2 &
> edad_c>=60
> (running logistic on estimation sample)
>
> Survey: Logistic regression
>
> Number of strata   =        41                  Number of obs      =       710
> Number of PSUs     =       193                  Population size    =    563,75
>                                                Design df          =       152
>                                                F(   7,    146)    =      8,35
>                                                Prob > F           =    0,0000
>
> ------------------------------------------------------------------------------
>             |             Linearized
>     discAVD | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>      edad_c |       1,10       0,02     4,41   0,000         1,05        1,15
>             |
>        sexo |
>          1  |       1,00  (base)
>          2  |       2,60       0,82     3,02   0,003         1,39        4,85
>             |
>      estud4 |
>          0  |       1,00  (base)
>          1  |       0,87       0,32    -0,38   0,707         0,42        1,79
>          2  |       0,90       0,40    -0,25   0,807         0,37        2,16
>          3  |       0,60       0,27    -1,15   0,254         0,24        1,46
>             |
>   difinmes3 |
>          0  |       1,00  (base)
>          1  |       1,59       0,56     1,32   0,189         0,79        3,21
>          2  |       3,33       1,18     3,39   0,001         1,65        6,72
>             |
>       _cons |       0,00       0,00    -5,88   0,000         0,00        0,00
> ------------------------------------------------------------------------------
>
> . estat gof
>
> Logistic model for discAVD, goodness-of-fit test
>
>                     F(9,144) =       110,29
>                     Prob > F =         0,0000
>
>
>
> But if I get rid of the survey especifications, I get:
>
> . logistic discAVD edad_c i.sexo i.estud4 i.difinmes3 if disdesjub==1
> & disdestr==1 & trab==1 & dismy50==1 & proxy==2 & edad_c>=60
>
> Logistic regression                               Number of obs   =        710
>                                                  LR chi2(7)      =      65,87
>                                                  Prob > chi2     =     0,0000
> Log likelihood = -210,78135                       Pseudo R2       =     0,1351
>
> ------------------------------------------------------------------------------
>     discAVD | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>      edad_c |       1,10       0,02     5,28   0,000         1,06        1,14
>             |
>        sexo |
>          1  |       1,00  (base)
>          2  |       1,96       0,56     2,36   0,018         1,12        3,44
>             |
>      estud4 |
>          0  |       1,00  (base)
>          1  |       0,87       0,29    -0,42   0,676         0,45        1,69
>          2  |       0,88       0,40    -0,28   0,781         0,36        2,14
>          3  |       0,52       0,25    -1,37   0,170         0,21        1,32
>             |
>   difinmes3 |
>          0  |       1,00  (base)
>          1  |       1,89       0,61     1,97   0,049         1,00        3,57
>          2  |       3,84       1,39     3,70   0,000         1,88        7,83
>             |
>       _cons |       0,00       0,00    -7,01   0,000         0,00        0,00
> ------------------------------------------------------------------------------
>
> . estat gof
>
> Logistic model for discAVD, goodness-of-fit test
>
>       number of observations =       710
> number of covariate patterns =       350
>            Pearson chi2(342) =       328,89
>                  Prob > chi2 =         0,6852
>
>
> The last two models don't look terribly different, so what is the
> reason for a such a large change in the Hosmer&Lemeshow result? Which
> one should I trust?
>
> Thank you for your time and attention.
>
> Angel Rodriguez-Laso
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index