|Title||Relationship between the chi-squared and F distributions|
|Author||William Gould, StataCorp|
|Date||July 1999; minor revisions June 2013|
F and chi-squared statistics are really the same thing in that, after a normalization, chi-squared is the limiting distribution of the F as the denominator degrees of freedom goes to infinity. The normalization is
For instance, if you tell me that you have an F(2,71) = 2.05, the corresponding chi-squared is 2 * 2.05 = 4.1 and, by the way, the tail probabilities are virtually the same:
F(2,71) = 2.05 p = .1363 chi2(2) = 4.1 p = .1287
As the denominator degrees of freedom (the 71) get larger and larger, the F() value will go toward the value reported by chi2() of .1287:
F(2, 150) = 2.05 p = .1323 F(2, 250) = 2.05 p = .1309 F(2, 500) = 2.05 p = .1298 F(2, 1000) = 2.05 p = .1293 F(2, 5000) = 2.05 p = .1288 F(2, 10000) = 2.05 p = .1288 F(2,100000) = 2.05 p = .1287
Except for the conceptually irrelevant normalization of multiplying the statistic's value by the numerator degrees of freedom, the relationship between chi-squared and F is the same as the relationship between t and the normal.
Anyway, I can look at output and switch one to the other whenever I think that it is appropriate. If I see the output that the F(5,100) statistic is 4.79, I can calculate that the corresponding chi2(5) statistic is 5 * 4.79 = 23.95. If I see in the computer output that the chi2(5) statistic is 7, and I think that an F(5,50) would be more appropriate, I can calculate that the F(5,50) statistic is 7/5 = 1.4.
I can switch from one to the other—I just have to remember to do the multiplication or division by the (numerator) degrees of freedom. Remember, the expected value of F is 1, and the expected value of chi-squared is the (numerator) degrees of freedom.
Okay, that’s the mechanics. Now, let’s discuss what is really going on and when one or the other statistic is the relevant one.
When I analyze data, I model the data as having a random component. That random component results in the numbers I calculate from the data having a random component. For instance, I model
yj = xj * b + uj , uj ~ N(0, sigma2)
In my model, the yj values I observe have the random component uj, Gaussian noise. If it were not for the random component in the yj process, I could estimate b by solving
b = yj / xj for any j
In my data, however, uj is not always equal to zero, so the formula is thus
b = (yj - uj) / xj for any j
I cannot make that calculation, because I do not observe uj; all I know about uj is that it is normally distributed with mean 0. So, I come up with another estimator of b, in particular
b_hat = (X'X)-1 X'y
and, concerning that calculated value, I am uncertain that I have it exactly right. Understand that the true b itself is not random. The true b is just a fixed number such as 3. My estimate of the true b, however, is random, by which I mean only that b_hat has a distribution. The true b might be 3, but I might calculate b_hat = 2.5.
Thus b_hat has a distribution that arises because of the random component, uj, in the original model. This distribution is called the sampling distribution of b_hat.
Now, it turns out that the sampling distribution of b_hat is a function of N, the number of observations. It also turns out that, as N→infinity, the sampling distribution becomes (jointly) normal. That turns out to be true for lots of estimators. In any case, let's call the sampling distribution of b_hat f(b_hat, N), and let g(b_hat) represent the asymptotic sampling distribution of b_hat,
g(b_hat) = lim f(b_hat, N) N→inf
g(bhat) being normal leads to test statistics with the normal distribution for single parameters and, correspondingly, tests with the chi-squared distribution when testing multiple parameters simultaneously.
However, no one really estimates models on asymptotic samples. Your sample has a finite number of observations. Thus rather than using g(b_hat), you would rather use f(b_hat, N) if only you knew it. For most estimators, we do not know f(b_hat, N), so we use g(b_hat), and we do tests using normals and chi-squareds.
In some cases—such as linear regression—we do know the sampling distribution for finite samples and, in those cases, we can calculate a test with better coverage probabilities.
Thus the Wald test is usually discussed as a chi-squared test, because it is usually applied to problems where only the asymptotic sampling distribution is known. But if we do know the sampling distribution for finite samples, we certainly want to use that.