Why does test sometimes produce chi-squared and other times F statistics?
How are the chi-squared and F distributions related?
|
Title
|
|
Relationship between the chi-squared and F distributions
|
|
Author
|
William Gould, StataCorp
|
|
Date
|
July 1999; minor revisions July 2009
|
F and chi-squared statistics are really the same thing in that, after
a normalization, chi-squared is the limiting distribution of the F as
the denominator degrees of freedom goes to infinity. The normalization is
-
chi-squared = (numerator degrees of freedom) * F
For instance, if you tell me that you have an F(2,71) = 2.05, the
corresponding chi-squared is 2 * 2.05 = 4.1 and, by the way, the tail
probabilities are virtually the same:
F(2,71) = 2.05 p = .1363
chi2(2) = 4.1 p = .1287
As the denominator degrees of freedom (the 71) get larger and larger, the
F() value will go toward the value reported by chi2() of .1287:
F(2, 150) = 2.05 p = .1323
F(2, 250) = 2.05 p = .1309
F(2, 500) = 2.05 p = .1298
F(2, 1000) = 2.05 p = .1293
F(2, 5000) = 2.05 p = .1288
F(2, 10000) = 2.05 p = .1288
F(2,100000) = 2.05 p = .1287
Except for the conceptually irrelevant normalization of multiplying the
statistic's value by the numerator degrees of freedom, the relationship
between chi-squared and F is the same as the relationship between
t and the normal.
Anyway, I can look at output and switch one to the other whenever I think
that it is appropriate. If I see the output that the F(5,100) statistic is
4.79, I can calculate that the corresponding chi2(5) statistic is 5 * 4.79 =
23.95. If I see in the computer output that the chi2(5) statistic is 7, and
I think that an F(5,50) would be more appropriate, I can calculate that the
F(5,50) statistic is 7/5 = 1.4.
I can switch from one to the other—I just have to remember to do the
multiplication or division by the (numerator) degrees of freedom. Remember,
the expected value of F is 1, and the expected value of chi-squared
is the (numerator) degrees of freedom.
Okay, that’s the mechanics. Now, let’s discuss what is really
going on and when one or the other statistic is the relevant one.
When I analyze data, I model the data as having a random component. That
random component results in the numbers I calculate from the data having a
random component. For instance, I model
yj = xj * b + uj , uj ~ N(0, sigma2)
In my model, the yj values I observe have the random component
uj, Gaussian noise. If it were not for the random component in
the yj process, I could estimate b by solving
b = yj / xj for any j
In my data, however, uj is not always equal to zero, so the
formula is thus
b = (yj - uj) / xj for any j
I cannot make that calculation, because I do not observe uj; all
I know about uj is that it is normally distributed with mean 0.
So, I come up with another estimator of b, in particular
b_hat = (X'X)-1 X'y
and, concerning that calculated value, I am uncertain that I have it exactly
right. Understand that the true b itself is not random. The true b is just a
fixed number such as 3. My estimate of the true b, however, is random, by
which I mean only that b_hat has a distribution. The true b might be 3, but
I might calculate b_hat = 2.5.
Thus b_hat has a distribution that arises because of the random component,
uj, in the original model. This distribution is called the
sampling distribution of b_hat.
Now, it turns out that the sampling distribution of b_hat is a function of
N, the number of observations. It also turns out that, as N→infinity,
the sampling distribution becomes (jointly) normal. That turns out to be
true for lots of estimators. In any case, let's call the sampling
distribution of b_hat f(b_hat, N), and let g(b_hat) represent the asymptotic
sampling distribution of b_hat,
g(b_hat) = lim f(b_hat, N)
N→inf
g(bhat) being normal leads to test statistics with the normal distribution
for single parameters and, correspondingly, tests with the chi-squared
distribution when testing multiple parameters simultaneously.
However, no one really estimates models on asymptotic samples. Your sample
has a finite number of observations. Thus rather than using g(b_hat), you
would rather use f(b_hat, N) if only you knew it. For most estimators, we
do not know f(b_hat, N), so we use g(b_hat), and we do tests using
normals and chi-squareds.
In some cases—such as linear regression—we do know the sampling
distribution for finite samples and, in those cases, we can calculate a test
with better coverage probabilities.
Thus the Wald test is usually discussed as a chi-squared test, because it is
usually applied to problems where only the asymptotic sampling distribution
is known. But if we do know the sampling distribution for finite samples,
we certainly want to use that.
|