Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: svylogitgof: changes dramatically across models using the same pooled sample

From	Eileen Diaz McConnell <[email protected]>
To	"[email protected]" <[email protected]>
Subject	st: RE: svylogitgof: changes dramatically across models using the same pooled sample
Date	Thu, 14 Apr 2011 09:53:05 -0700

Thanks again, Rich and Steve:

Now that I updated to Stata 11.2 from Stata 11, the estat gof option after a svy: logistic command using "if" instead of subpop works fine.

Incidentally, the results from using svylogitgof and estat gof were very similar for one of the reference groups; but for the model that I was having a hard time understanding the fit statistic--clearly the problem was using svylogitgof. 

Thanks again!
Eileen

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Eileen Diaz McConnell
Sent: Tuesday, April 12, 2011 8:14 PM
To: [email protected]
Subject: RE: st: svylogitgof: changes dramatically across models using the same pooled sample

I'll look into estat gof as an option.  Appreciate this advice!
-Eileen

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Richard Williams
Sent: Tuesday, April 12, 2011 8:35 PM
To: [email protected]; [email protected]
Subject: Re: st: svylogitgof: changes dramatically across models using the same pooled sample

At 07:53 PM 4/12/2011, Steven Samuels wrote:
>Eileen, I assume that you're on Stata 11, since you don't say 
>otherwise.  -svylogitgof- doesn't work with subpop() options and has 
>other problems in Stata 11.
>
>See: http://www.stata.com/statalist/archive/2011-03/msg00442.html
>
>Steve
>[email protected]

Last I remember seeing, getting -estat gof- to work with the subpop option was on the Stata development list:

http://www.stata.com/statalist/archive/2011-03/msg00555.html

I wonder how terrible it would be if you used the -if- option instead of -subpop-? I know that is frowned upon, but if the alternative is to do nothing at all, which is better?

Hi Statalist Users:

Hoping that you can offer some advice about the following issue.

I am doing a logistic regression using the svy: logit command in Stata SE11.  I am running the identical model several times with the only change being the reference group.   

Here is an example of the same model with the only change being the reference group-this one leaves out race_whnb.

svy, subpop(finalp7):logistic hindpvt3 race_bknb ltcitizen ltauimm ltunauthmm x1 x2 x3 svylogitgof

This second model leaves out ltunauthmm:

svy, subpop(finalp7):logistic hindpvt3 race_bknb race_whnb ltcitizen ltauimm x1 x2 x3 Svylogitgof

As expected, the odds ratios and standard errors change across the two models for the contrast variables listed (race_bknb, etc) but are the same for all the remaining independent variables (x1 x2 x3) in both models.

What seems strange to me; however, is that the F-adjusted t statistic (svylogitgof) is radically different for these two models.

With the first model:

F  adjusted test statistic= 1.0887
		  Pvalue=.386323

With the second model:

F  adjusted test statistic= 17.4193
		  Pvalue=4.629e-13

As I understand the interpretation of the F test statistic, these results suggest that the data are a good fit for the first model and not a good fit for the second model.

However, I'm concerned that the F statistic would change so dramatically when it's using the same pooled sample and simply changing the reference group.

I wonder if this is somehow due to the sample size and different distributions of these groups on the dependent variable (hindpvt3).

The sample size is fairly small, pooled sample is 1361; n for race_whnb=350 and n for ltunauthmm=247.
Descriptives of hindpvt3 for the 2 groups are:   race_whnb=33/350 have a value of 1 on hindpvt3;  ltaunauthmm= 117/247 have a value of 1 on hindpvt3.

What I also just noticed is that the missing values generated for each one is different, but not sure if that is normal or not, either.

For the first model:  
svylogitgof
(57 missing values generated)

And for the second model:
svylogitgof
(118 missing values generated).

Given this information, does the F statistic and interpretation sound valid in this case?  Or could the wide swing in the F statistic be due to very small sample sizes or something about the missing values generated?

Any suggestions about what to investigate further would be very much appreciated. Thanks for your consideration.

Eileen Diaz McConnell, Ph.D.
School of Transborder Studies
Arizona State University
Tempe, AZ 85287-3502

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Eileen Diaz McConnell
Sent: Tuesday, April 12, 2011 10:11 AM
To: [email protected]
Subject: st: svylogitgof: changes dramatically across models using the same pooled sample

Hi Statalist Users:

Hoping that you can offer some advice about the following issue.

I am doing a logistic regression using the svy: logit command in Stata SE11.  I am running the identical model several times with the only change being the reference group.   

Here is an example of the same model with the only change being the reference group-this one leaves out race_whnb.

svy, subpop(finalp7):logistic hindpvt3 race_bknb ltcitizen ltauimm ltunauthmm x1 x2 x3 svylogitgof

This second model leaves out ltunauthmm:

svy, subpop(finalp7):logistic hindpvt3 race_bknb race_whnb ltcitizen ltauimm x1 x2 x3 Svylogitgof

As expected, the odds ratios and standard errors change across the two models for the contrast variables listed (race_bknb, etc) but are the same for all the remaining independent variables (x1 x2 x3) in both models.

What seems strange to me; however, is that the F-adjusted t statistic (svylogitgof) is radically different for these two models.

With the first model:

F  adjusted test statistic= 1.0887
		  Pvalue=.386323

With the second model:

F  adjusted test statistic= 17.4193
		  Pvalue=4.629e-13

As I understand the interpretation of the F test statistic, these results suggest that the data are a good fit for the first model and not a good fit for the second model.

However, I'm concerned that the F statistic would change so dramatically when it's using the same pooled sample and simply changing the reference group.

I wonder if this is somehow due to the sample size and different distributions of these groups on the dependent variable (hindpvt3).

The sample size is fairly small, pooled sample is 1361; n for race_whnb=350 and n for ltunauthmm=247.
Descriptives of hindpvt3 for the 2 groups are:   race_whnb=33/350 have a value of 1 on hindpvt3;  ltaunauthmm= 117/247 have a value of 1 on hindpvt3.

What I also just noticed is that the missing values generated for each one is different, but not sure if that is normal or not, either.

For the first model:  
svylogitgof
(57 missing values generated)

And for the second model:
svylogitgof
(118 missing values generated).

Given this information, does the F statistic and interpretation sound valid in this case?  Or could the wide swing in the F statistic be due to very small sample sizes or something about the missing values generated?

Any suggestions about what to investigate further would be very much appreciated. Thanks for your consideration.

Eileen Diaz McConnell, Ph.D.
School of Transborder Studies
Arizona State University
Tempe, AZ 85287-3502

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: svylogitgof: changes dramatically across models using the same pooled sample
  - From: Eileen Diaz McConnell <[email protected]>

Prev by Date: Re: st: Count observations within timeframe
Next by Date: st: summer school at KU
Previous by thread: Re: st: svylogitgof: changes dramatically across models using the same pooled sample
Next by thread: st: Sample: drawing the same "random" sample
Index(es):
- Date
- Thread