[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Gof for ologit/oprobit
n j cox <firstname.lastname@example.org>
Re: st: Gof for ologit/oprobit
Wed, 31 Oct 2007 10:01:24 +0000
Clive Nicholas and Richard Williams (both, like me, formally
non-statisticians) made many good points. I guess Dan didn't mean
to imply that "non-statistician" necessarily means statistically uninformed, as that would put a question-mark on most of the members of this list. I'll take it as shorthand for "someone who does not know much statistics".
I'd add a few comments. First of all some marginal disagreement:
Richard wrote, and Clive agreed:
> The count measures can be pretty much useless when one outcome is
> rare, e.g. only 10% get a zero, because it will then often be the
> case that every case gets predicted as a 1.
I am not clear that this is an indictment of the measures concerned.
If your model can't predict rare outcomes, that is part of its
limitations, and you really should want to know that. Of course,
almost no model can predict rare outcomes, as most operate on
some kind of averaging, but that doesn't change the principle.
"Goodness-of-fit" is in part a propaganda term. Less common
is "badness-of-fit", a term I believe is due to, or at least
was spread by, Joseph B. Kruskal. Google counts of GOF versus
BOF run about 10 to 1. That surprised me: I would have guessed
more than 100 to 1. A case of my uneven sampling of the literature,
In linear regression if forced to choose a single measure I would use
RMS error (~ SD of residuals), not R^2. It is on the scale of the
response, and it is less likely to impart false optimism. I'd
go for an RMS error in other models whenever it was computable
(and it is whenever predictions can be made on the scale of
the response variable).
Nobody put in a word for graphical assessment. Scientists like
observed vs fitted plots (calibration plots). Statistically-minded
people usually start with residual vs fitted plots as a health check.
(No news is good news.) It's true, unfortunately, that many of these plots are more difficult to define, or to work with, for discrete response models. It's also true, unfortunately, that StataCorp provided various graphical add-ons for use after -regress- and -anova- but stopped about there. In the Stata Journal in 2004 I wrote up a -modeldiag- package, but it doesn't really extend to ordered logit or ordered probit because of the multiple outcomes. (-findit modeldiag- for locations.) So, that just added a task to my to-do list.
For ordered *it, I'd want first a cross-plot of observed and predicted
Here is a dopey example with Stata 8.2.
. sysuse auto, clear
. ologit rep78 mpg
. predict predicted, xb
. scatter rep78 predicted
Naturally, you might also want to round the predictions. -tabplot-
from SSC then provides a way of keeping the comparison graphical.
It's possible to add an R^2, naturally, but not necessarily useful.
In a loosely similar thread, Maarten Buis recently underlined
a simple but fundamental point he makes to social science
students. Paraphrasing, and he might dissent from this wording:
A perfect model could mean that I can predict your behaviour
or condition just from knowing a few things about you. Does that
tally with how you think you (and people (and society)) actually work?
What is the best way to communicate to non-statisticians the Goodness
of Fit (gof) of an ordered logit/ordered probit model?
For OLS, there is the trusty R2, letting you tell a non-statistician,
"I can explain X% of the variation in the dependent variable."
For logit/probit, I've used the probability of correct classification,
type I and type II error rates as my go-to metric for gof.
Is there a corresponding metric for ordered logit/ordered probit?
I've read about psuedo R2 and it's faults. Probability of correct
classification doesn't seem fair given the multiple categories of the
dependent variable - if my model predicts you'll be a 2 but you're a
3, I get no credit for being close.
* For searches and help try: