Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: question on goodness of fit tests


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: question on goodness of fit tests
Date   Thu, 13 Jul 2006 17:27:54 +0100

You need to take us both upstream and downstream of this. 

What is a beta geometric distribution, how you are fitting it, 
are there covariates, etc.? 

(Beta I know of, geometric I know of.) 

Assessing model fit has to be more subtle than you are implying here. 

Some pathologies that I have noticed: 

0. Analysts are often reluctant to use the best model assessment 
tool of all, graphics. (Not true here, naturally.) Graphics 
are looked down on because their assessment requires too much 
subjectivity (too much judgement, too much experience!). On 
the other hand, who can argue with an R-square or a P-value? 

1. Analysts often get quantitative evidence of goodness of fit, 
but then want to degrade that into "OK" or "not OK". 
Sometimes the tacit rationale is that "OK" means "it's publishable", 
"it's acceptable to my boss", or some such. These are important for personal 
satisfaction, but only tangentially connected to scientific progress. 

2. Analysts often want to narrow things down to a single factotum, 
omnibus or portmanteau statistic, unsurprisingly often whatever shows the model
in the best light. This is usually non-statistical, as reduction
from a set of many numbers to one typically throws away most of the 
information in that set. 

3. A goodness-of-fit statistic can look quite good -- but by 
comparison with a null hypothesis that no one takes seriously. 

4. A goodness-of-fit statistic answers the question it was 
designed for, while the answers to other nearby questions may be quite
different. 

sysuse auto, clear 

foreach v of var price-foreign { 
	swilk `v' 
}

indicates that the most nearly "normal" (meaning "Gaussian") of the auto 
variables is -rep78-. But a glance at the data shows
that this variable -- which is a set of  ordinal codes 1,2,3,4,5 --
is in most other respects _not_ well characterised 
as a normal with mean 3.406 and sd .990. 

5. The most appropriate answer to "does a model fit?" may
be "yes, in so far as it does; no, in so far as it doesn't", 
banal though that may be. It can be more fruitful to focus on 
which model fits better than which other model(s). 

That's pontificating way beyond your case. Returning 
to that, 

a. Perhaps there is some other distribution you can 
compare it with. 

b. Kolmogorov-Smirnov is usually reckoned dubious when
the parameters are estimated from the data, which is
almost always. 

c. I don't know what the standard chi-square test means
here. Is this a continuous variable, in which case
there is no "standard chi-square test" at all, just 
a very large set of them depending precisely on how 
you bin the data? 
 
Nick 
n.j.cox@durham.ac.uk 

Narasimhan Sowmyanarayanan
 
> I am facing a small confusion that I have not been able to resolve. I
> am trying to fit a beta geometric distribution to my dataset. I tried
> the standard chi square goodness of fit test and this is rejected in
> my data. (chi square = 122 and df = 34).
> 
> However an examination of the correlation between the two series shows
> a correlation of .9995 and I actually tried superimposing a plot of
> the observed and the expected values. The values are extremely close.
> 
> I tried something that may not be entirely right in this context but
> just out of curiosity. I used the kolmogorov-Smirnov test by treating
> the observed and the expected values as different groups and the test
> indicated that the distributions were not different between the
> observed and the expected values .
> 
> Two-sample Kolmogorov-Smirnov test for equality of 
> distribution functions:
> 
>  Smaller group       D       P-value      Exact
>  ----------------------------------------------
>  0:                  0.2059    0.237
>  1:                 -0.0882    0.767
>  Combined K-S:       0.2059    0.467      0.307
> 
> Can someone suggest a way forward. Is it correct to assume that the
> model fits the data ?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index