The following question and answer is based on an exchange
that started on Statalist.
How can I calculate the pseudo-R2 for xtprobit?
|
Title
|
|
Pseudo-R2 for probit
|
|
Author
|
William Gould, StataCorp
|
|
Date
|
October 2001
|
Question:
I estimated a random-effects probit model using
xtprobit. A
referee asks for a goodness-of-fit measure (some
pseudo-R2, or so). Although I do not see what we can
learn from reporting such a number [...], I consider the damage from
including it into the table of results to be minimal compared to the damage
from trying to convince the referee. Anyway, I cannot find a goodness-of-fit
measure in my output. [...] Where is the pseudo-R2 for
xtprobit, or how can I calculate the number from information given in
the output?
Answer:
xtprobit is one of those models for which the log likelihood would be
zero if the fit were perfect, so we can just scale the log-likelihood value
of your model so that 1 corresponds to a log likelihood of 0 and 0
corresponds to the log likelihood of the constant-only model.
We can get the log likelihood of the constant-only model by typing
. xtprobit outcome_variable
So let’s pretend that
LL_o = −35.670226 (constant-only model)
LL_f = −25.767073 (full model)
LL_p = 0.0 (perfect model)
All we need to do is scale the above so LL_0 corresponds to 0 and LL_p
corresponds to 1.
Pseudo R2 = (35.670226 − 25.767073)/35.670226
= .2776
You can see the Methods and Formulas for
[R] maximize
for a justification of the above formula.
Not too much strikes me wrong with the above, and I recommend you use it.
If I were asked to criticize the above, I would point out that the perfect
model leaves no room for a random effect (the random effect must be zero),
and so perhaps the pseudo-R2 value calculated is too low
in some sense. This does not really bother me; you are just looking for a
value to reflect, in some vague sense, how well you have fit the data, and
the above calculation certainly does so in a reasonable way.
Be careful when obtaining the log likelihood for the constant-only model
that you fit the model on the same estimation subsample on which you fitted
the full model. Remember, Stata drops observations in which variables have
missing values and, in the constant-only model, you are not specifying those
variables. Probably the safest thing to do is refit the full model and then
fit the constant-only model if e(sample).
|