You need to take us both upstream and downstream of this.
What is a beta geometric distribution, how you are fitting it,
are there covariates, etc.?
(Beta I know of, geometric I know of.)
Assessing model fit has to be more subtle than you are implying here.
Some pathologies that I have noticed:
0. Analysts are often reluctant to use the best model assessment
tool of all, graphics. (Not true here, naturally.) Graphics
are looked down on because their assessment requires too much
subjectivity (too much judgement, too much experience!). On
the other hand, who can argue with an R-square or a P-value?
1. Analysts often get quantitative evidence of goodness of fit,
but then want to degrade that into "OK" or "not OK".
Sometimes the tacit rationale is that "OK" means "it's publishable",
"it's acceptable to my boss", or some such. These are important for personal
satisfaction, but only tangentially connected to scientific progress.
2. Analysts often want to narrow things down to a single factotum,
omnibus or portmanteau statistic, unsurprisingly often whatever shows the model
in the best light. This is usually non-statistical, as reduction
from a set of many numbers to one typically throws away most of the
information in that set.
3. A goodness-of-fit statistic can look quite good -- but by
comparison with a null hypothesis that no one takes seriously.
4. A goodness-of-fit statistic answers the question it was
designed for, while the answers to other nearby questions may be quite
different.
sysuse auto, clear
foreach v of var price-foreign {
swilk `v'
}
indicates that the most nearly "normal" (meaning "Gaussian") of the auto
variables is -rep78-. But a glance at the data shows
that this variable -- which is a set of ordinal codes 1,2,3,4,5 --
is in most other respects _not_ well characterised
as a normal with mean 3.406 and sd .990.
5. The most appropriate answer to "does a model fit?" may
be "yes, in so far as it does; no, in so far as it doesn't",
banal though that may be. It can be more fruitful to focus on
which model fits better than which other model(s).
That's pontificating way beyond your case. Returning
to that,
a. Perhaps there is some other distribution you can
compare it with.
b. Kolmogorov-Smirnov is usually reckoned dubious when
the parameters are estimated from the data, which is
almost always.
c. I don't know what the standard chi-square test means
here. Is this a continuous variable, in which case
there is no "standard chi-square test" at all, just
a very large set of them depending precisely on how
you bin the data?
Nick
n.j.cox@durham.ac.uk
Narasimhan Sowmyanarayanan
> I am facing a small confusion that I have not been able to resolve. I
> am trying to fit a beta geometric distribution to my dataset. I tried
> the standard chi square goodness of fit test and this is rejected in
> my data. (chi square = 122 and df = 34).
>
> However an examination of the correlation between the two series shows
> a correlation of .9995 and I actually tried superimposing a plot of
> the observed and the expected values. The values are extremely close.
>
> I tried something that may not be entirely right in this context but
> just out of curiosity. I used the kolmogorov-Smirnov test by treating
> the observed and the expected values as different groups and the test
> indicated that the distributions were not different between the
> observed and the expected values .
>
> Two-sample Kolmogorov-Smirnov test for equality of
> distribution functions:
>
> Smaller group D P-value Exact
> ----------------------------------------------
> 0: 0.2059 0.237
> 1: -0.0882 0.767
> Combined K-S: 0.2059 0.467 0.307
>
> Can someone suggest a way forward. Is it correct to assume that the
> model fits the data ?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/