Search
   >> Home >> Resources & support >> FAQs >> Relationship between the chi-squared and F distributions

Why does the test command sometimes produce chi-squared and other times F statistics?

How are the chi-squared and F distributions related?

Title   Relationship between the chi-squared and F distributions
Author William Gould, StataCorp
Date July 1999; minor revisions June 2013

F and chi-squared statistics are really the same thing in that, after a normalization, chi-squared is the limiting distribution of the F as the denominator degrees of freedom goes to infinity. The normalization is

chi-squared = (numerator degrees of freedom) * F

For instance, if you tell me that you have an F(2,71) = 2.05, the corresponding chi-squared is 2 * 2.05 = 4.1 and, by the way, the tail probabilities are virtually the same:

        F(2,71) = 2.05      p = .1363
        chi2(2) = 4.1       p = .1287

As the denominator degrees of freedom (the 71) get larger and larger, the F() value will go toward the value reported by chi2() of .1287:

        F(2,   150) = 2.05     p = .1323
        F(2,   250) = 2.05     p = .1309
        F(2,   500) = 2.05     p = .1298
        F(2,  1000) = 2.05     p = .1293
        F(2,  5000) = 2.05     p = .1288
        F(2, 10000) = 2.05     p = .1288
        F(2,100000) = 2.05     p = .1287   

Except for the conceptually irrelevant normalization of multiplying the statistic's value by the numerator degrees of freedom, the relationship between chi-squared and F is the same as the relationship between t and the normal.

Anyway, I can look at output and switch one to the other whenever I think that it is appropriate. If I see the output that the F(5,100) statistic is 4.79, I can calculate that the corresponding chi2(5) statistic is 5 * 4.79 = 23.95. If I see in the computer output that the chi2(5) statistic is 7, and I think that an F(5,50) would be more appropriate, I can calculate that the F(5,50) statistic is 7/5 = 1.4.

I can switch from one to the other—I just have to remember to do the multiplication or division by the (numerator) degrees of freedom. Remember, the expected value of F is 1, and the expected value of chi-squared is the (numerator) degrees of freedom.

Okay, that’s the mechanics. Now, let’s discuss what is really going on and when one or the other statistic is the relevant one.

When I analyze data, I model the data as having a random component. That random component results in the numbers I calculate from the data having a random component. For instance, I model

        yj   =  xj * b  +  uj   ,       uj   ~  N(0, sigma2)

In my model, the yj values I observe have the random component uj, Gaussian noise. If it were not for the random component in the yj process, I could estimate b by solving

        b = yj  / xj                 for any j

In my data, however, uj is not always equal to zero, so the formula is thus

        b = (yj  - uj) / xj           for any j

I cannot make that calculation, because I do not observe uj; all I know about uj is that it is normally distributed with mean 0. So, I come up with another estimator of b, in particular

        b_hat  =  (X'X)-1  X'y

and, concerning that calculated value, I am uncertain that I have it exactly right. Understand that the true b itself is not random. The true b is just a fixed number such as 3. My estimate of the true b, however, is random, by which I mean only that b_hat has a distribution. The true b might be 3, but I might calculate b_hat = 2.5.

Thus b_hat has a distribution that arises because of the random component, uj, in the original model. This distribution is called the sampling distribution of b_hat.

Now, it turns out that the sampling distribution of b_hat is a function of N, the number of observations. It also turns out that, as N→infinity, the sampling distribution becomes (jointly) normal. That turns out to be true for lots of estimators. In any case, let's call the sampling distribution of b_hat f(b_hat, N), and let g(b_hat) represent the asymptotic sampling distribution of b_hat,

        g(b_hat) =    lim    f(b_hat, N)
                     N→inf

g(bhat) being normal leads to test statistics with the normal distribution for single parameters and, correspondingly, tests with the chi-squared distribution when testing multiple parameters simultaneously.

However, no one really estimates models on asymptotic samples. Your sample has a finite number of observations. Thus rather than using g(b_hat), you would rather use f(b_hat, N) if only you knew it. For most estimators, we do not know f(b_hat, N), so we use g(b_hat), and we do tests using normals and chi-squareds.

In some cases—such as linear regression—we do know the sampling distribution for finite samples and, in those cases, we can calculate a test with better coverage probabilities.

Thus the Wald test is usually discussed as a chi-squared test, because it is usually applied to problems where only the asymptotic sampling distribution is known. But if we do know the sampling distribution for finite samples, we certainly want to use that.

The Stata Blog: Not Elsewhere Classified Find us on Facebook Follow us on Twitter LinkedIn Google+ Watch us on YouTube