Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: How can I access goodness-of-fit p-value in a program? (re-send with subject line)

From   Maarten buis <>
Subject   Re: st: How can I access goodness-of-fit p-value in a program? (re-send with subject line)
Date   Sun, 29 Jan 2006 20:29:09 +0000 (GMT)

> I have written a program which generates a table of odds ratios from many
> bivariate logistic regression models.  I would like to add the p values
> associated with the Hosmer Lemeshow chi2 statistic (as generated by 
> . estat gof, group(10)
> to the table.  -estat- returns r(chi2) but not the p-value.  The stata
> function 'chi2' returns a very different p-value than that returned by
> -estat gof- Any ideas?

You should have used -chi2tail- instead of -chi2-. 

*----------begin example-------
sysuse auto, clear
recode rep78 1=2
tab rep78 foreign, chi2
return list
di chi2(3,r(chi2))
di 1-chi2(3,r(chi2))
di chi2tail(3,r(chi2))
*-----------end example------------
> The bivariate logistic models consist of one continuous dependent variable
> of interest which is included in every model (var1, below) together with a
> different (mostly dichotomous) variable (var2) for each model, i.e.:
> .logistic death var1 var2
> The point is to assess the change in the odds ratio associated with var1
> with inclusion of var2.

It looks like you are doing some sensitivity analysis or extreme bounds analysis like
(Sala-i-Martin 1997), is that right? 

> Depending on var2, .estat gof, group(10) will generate less than 10
> quantiles due to ties.  
> The question:  In reporting multiple p values for multiple gof tests, should
> I accept whatever groupings Stata gives me and report the stats with
> different df's for each one?  Or should I force the same # of groups for
> each model? 

I don't see why you would use the Hosmer Lemeshow chi2 statistic if it is inconvenient. You are
after all interested in changes in the parameter of interest, not in the goodness of fit. More
specifically, Agresti (2002, p. 177) warns that the Hosmer-Lemeshow statistic does not have good
power for detecting particular types of lack of fit, i.e. you accept models as having a good fit
(or more precisely you do not reject the hypothesis that the fit is bad), which should not be
accepted. If you want to report a goodness of fit statistic I would report the BIC, since this is
comparabel across non-nested models. See for instance (Raftery 1995) (and discussants in the
following articles) for a discussion. As an added bonus Adrian Raftery promotes the use of BICs
for approximate Bayesian Model Averaging, which is not unlike Sala-i-Martin's (2003) later take on
this model uncertainty problem.

Sala-i-Martin, Xavier, 1997. "I Just Ran Two Million Regressions," American Economic Review,
American Economic Association, vol. 87(2), pages 178-83. 

Sala-i-Martin, X, G. Doppelhoffer, and R. Miller, "Determinants of Long Term Growth: A Bayesian
Average of Classical Estimates (BACE) Approach", American Economic Review, September 2003.

Agresti, Alan, (2002) Categorical Data Analysis, 2nd edition, Wiley

Raftery, Adrian E. (1995).  Bayesian model selection in social research (with Discussion).
Sociological Methodology, 25, 111-196. 


Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214

+31 20 5986715

Yahoo! Photos  NEW, now offering a quality print service from just 8p a photo
*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index