Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Hosmer-Lemeshow sticky issues


From   "Feiveson, Alan H. (JSC-SK311)" <alan.h.feiveson@nasa.gov>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Hosmer-Lemeshow sticky issues
Date   Thu, 7 Sep 2006 12:24:44 -0500

Ashwin - Logistic regression assumes a particular functional form for
the probability of "success", say  P(S), as a function of the
explanatory variables. Like any statistical model, there is no guarantee
that the logistic regression model exactly applies to real-world data.
At best it may be an approximate representation. It follows that with
large sample sizes any discrepancy between the model and the data will
be magnified, resulting in small p-values for a goodness of fit test.
However you can investigate other models (such as probit) that use
different functional forms for P(S). You can even write your own link
function to try alternative custom-made models.

Al Feiveson 

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Ashwin
Ananthakrishnan
Sent: Thursday, September 07, 2006 7:29 AM
To: statalist@hsphsun2.harvard.edu
Subject: st: Hosmer-Lemeshow sticky issues

Hi,

I"m trying to confirm goodness of fit for a logistic regression model
I'm working on, but I keep ending up with very small p-values implying
poor fit.

My outcome is a dichotomous variable - screened vs.
not screened. My predictor variables are age category (in 5 year
intervals), income tertiles, race, and gender. My final model
constructed through stepwise backward elimination includes all the
variables and some interaction terms. However, when I try to run the
goodness of fit test (Pearson or Hosmer Lemeshow), I keep getting
extremely small p-values. 

Can someone explain to me what this means? Does this mean that the model
is not valid, and the odds ratios are incorrect?

Can you get poor fit simply as a marker of large sample size (my sample
size is 500 000)

I'm not able to understand why the model doesn't fit when it has been
constructed from the data stepwise backward elimination, and all the
variables are univariately significant?


Thanks.
Ashwin


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index