# Re: st: RE: question on goodness of fit tests

 From "Narasimhan Sowmyanarayanan" To statalist@hsphsun2.harvard.edu Subject Re: st: RE: question on goodness of fit tests Date Thu, 13 Jul 2006 20:50:49 -0400

```Thanks. I think this helps a lot.

Thanks

On 7/13/06, Nick Cox <n.j.cox@durham.ac.uk> wrote:
```
```I don't follow all of this, but what I am gathering
is that your variable is continuous and that your
chi-square test is based on a term for each data
point. Something is wrong somewhere there. A
chi-square test for a continuous variable must be
based on a binning, classing or grouping of the
data into discrete cells. There is not a unique
way of doing that for a continuous variable.
That was my point.

Also, it is standard that to get adequate expected
frequencies it is often necessary to have wide
cells in the tails, and thus there is usually
some loss of sensitivity in crucial parts of the
support.

The point about Kolmogorov-Smirnov is standard in
the literature. See, for example, the late lamented
Leo Breiman's 1973 text "Statistics". I wouldn't
call it elementary at all, but it is discussed
in that and many other places.

Breiman's book is also excellent on the more general
topic. Naturally there are many other texts covering
this area.

Two main extra points in convincing a sceptic that
your result should be taken seriously could be

(a) a scientific rationale for your distribution
(physical, biological, economic, whatever is your
science)

and/or

(b) a lack of _systematic_ structure in the residuals.

I hope this helps, but I think I've donated my
tuppenceworth on this now.

Nick
n.j.cox@durham.ac.uk

Narasimhan Sowmyanarayanan

> Hello Nick:
>
> Thanks for your reply. By standard Chi-Square goodness of fit, I mean
> (square of expected - observed)/ expected value. I added this for each
> of the observations and summed this up over all the values.
>
> I also examined the graphs for looking at the fit. To my bare eye the
> fit is very good. My worry was that there are a few chisquare values
> that are slightly high (but not extremely high) that makes me reject
> the test of equality of expected and actual values.
>
> As a very rough approximation my tried correlating the two values that
> yeilds .999 (I  of course cannot rely on that becasue scaling can also
> cause high correlation but not give me  matching expected and observed
> values).
>
> I was trying out the kolmogorov tests only after I did all this just
> to see how it fits. Now to your question of beta distribution.
>
> I am trying get a hierarchical model where I am saying that the
> probability of an event occuring is 'p' but this is conditional on p
> being drawn from a beta distribution with parameters alpha and beta.
>
> I compared my model with a simple geometric model which was a much
> worse fit. But the question that needs to be asked is:
>
>  I could compare two different models and say that one is a better fit
> than the other. but, Is the improvement good enough ? Your point of
> classifying things as OK or not OK is taken, but how does one convince
> lets say a reviewer with the numbers.
>
> Finally, your observation on the Kolmgorov-Smirnov test as dubious
> when parameters are estimated from data. Is there any rationale for
> what is wrong ? (I hope this is not very elementry)
>
> Thanks again.
>
>
>
> On 7/13/06, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> > You need to take us both upstream and downstream of this.
> >
> > What is a beta geometric distribution, how you are fitting it,
> > are there covariates, etc.?
> >
> > (Beta I know of, geometric I know of.)
> >
> > Assessing model fit has to be more subtle than you are
> implying here.
> >
> > Some pathologies that I have noticed:
> >
> > 0. Analysts are often reluctant to use the best model assessment
> > tool of all, graphics. (Not true here, naturally.) Graphics
> > are looked down on because their assessment requires too much
> > subjectivity (too much judgement, too much experience!). On
> > the other hand, who can argue with an R-square or a P-value?
> >
> > 1. Analysts often get quantitative evidence of goodness of fit,
> > but then want to degrade that into "OK" or "not OK".
> > Sometimes the tacit rationale is that "OK" means "it's publishable",
> > "it's acceptable to my boss", or some such. These are
> important for personal
> > satisfaction, but only tangentially connected to scientific
> progress.
> >
> > 2. Analysts often want to narrow things down to a single factotum,
> > omnibus or portmanteau statistic, unsurprisingly often
> whatever shows the model
> > in the best light. This is usually non-statistical, as reduction
> > from a set of many numbers to one typically throws away most of the
> > information in that set.
> >
> > 3. A goodness-of-fit statistic can look quite good -- but by
> > comparison with a null hypothesis that no one takes seriously.
> >
> > 4. A goodness-of-fit statistic answers the question it was
> > designed for, while the answers to other nearby questions
> may be quite
> > different.
> >
> > sysuse auto, clear
> >
> > foreach v of var price-foreign {
> >         swilk `v'
> > }
> >
> > indicates that the most nearly "normal" (meaning
> "Gaussian") of the auto
> > variables is -rep78-. But a glance at the data shows
> > that this variable -- which is a set of  ordinal codes 1,2,3,4,5 --
> > is in most other respects _not_ well characterised
> > as a normal with mean 3.406 and sd .990.
> >
> > 5. The most appropriate answer to "does a model fit?" may
> > be "yes, in so far as it does; no, in so far as it doesn't",
> > banal though that may be. It can be more fruitful to focus on
> > which model fits better than which other model(s).
> >
> > That's pontificating way beyond your case. Returning
> > to that,
> >
> > a. Perhaps there is some other distribution you can
> > compare it with.
> >
> > b. Kolmogorov-Smirnov is usually reckoned dubious when
> > the parameters are estimated from the data, which is
> > almost always.
> >
> > c. I don't know what the standard chi-square test means
> > here. Is this a continuous variable, in which case
> > there is no "standard chi-square test" at all, just
> > a very large set of them depending precisely on how
> > you bin the data?
> >
> > Nick
> > n.j.cox@durham.ac.uk
> >
> > Narasimhan Sowmyanarayanan
> >
> > > I am facing a small confusion that I have not been able
> to resolve. I
> > > am trying to fit a beta geometric distribution to my
> dataset. I tried
> > > the standard chi square goodness of fit test and this is
> rejected in
> > > my data. (chi square = 122 and df = 34).
> > >
> > > However an examination of the correlation between the two
> series shows
> > > a correlation of .9995 and I actually tried superimposing
> a plot of
> > > the observed and the expected values. The values are
> extremely close.
> > >
> > > I tried something that may not be entirely right in this
> context but
> > > just out of curiosity. I used the kolmogorov-Smirnov test
> by treating
> > > the observed and the expected values as different groups
> and the test
> > > indicated that the distributions were not different between the
> > > observed and the expected values .
> > >
> > > Two-sample Kolmogorov-Smirnov test for equality of
> > > distribution functions:
> > >
> > >  Smaller group       D       P-value      Exact
> > >  ----------------------------------------------
> > >  0:                  0.2059    0.237
> > >  1:                 -0.0882    0.767
> > >  Combined K-S:       0.2059    0.467      0.307
> > >
> > > Can someone suggest a way forward. Is it correct to
> assume that the
> > > model fits the data ?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

```
```*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```