Stata | FAQ: Relationship between the chi-squared and F distributions

Home / Resources & support / FAQs / Relationship between the chi-squared and F distributions

Why does the test command sometimes produce chi-squared and other times F statistics?

How are the chi-squared and F distributions related?

Title		Relationship between the chi-squared and F distributions
Author		William Gould, StataCorp

F and chi-squared statistics are really the same thing in that, after a normalization, chi-squared is the limiting distribution of the F as the denominator degrees of freedom goes to infinity. The normalization is

chi-squared = (numerator degrees of freedom) * F

For instance, if you tell me that you have an F(2,71) = 2.05, the corresponding chi-squared is 2 * 2.05 = 4.1 and, by the way, the tail probabilities are virtually the same:

        F(2,71) = 2.05      p = .1363
        chi2(2) = 4.1       p = .1287

As the denominator degrees of freedom (the 71) get larger and larger, the F() value will go toward the value reported by chi2() of .1287:

        F(2,   150) = 2.05     p = .1323
        F(2,   250) = 2.05     p = .1309
        F(2,   500) = 2.05     p = .1298
        F(2,  1000) = 2.05     p = .1293
        F(2,  5000) = 2.05     p = .1288
        F(2, 10000) = 2.05     p = .1288
        F(2,100000) = 2.05     p = .1287

Except for the conceptually irrelevant normalization of multiplying the statistic's value by the numerator degrees of freedom, the relationship between chi-squared and F is the same as the relationship between t and the normal.

Anyway, I can look at the output and switch one to the other whenever I think that it is appropriate. If I see the output that the F(5,100) statistic is 4.79, I can calculate that the corresponding chi2(5) statistic is 5 * 4.79 = 23.95. If I see in the computer output that the chi2(5) statistic is 7, and I think that an F(5,50) would be more appropriate, I can calculate that the F(5,50) statistic is 7/5 = 1.4.

I can switch from one to the other—I just have to remember to do the multiplication or division by the (numerator) degrees of freedom. Remember, the expected value of F is 1, and the expected value of chi-squared is the (numerator) degrees of freedom.

Okay, that’s the mechanics. Now, let’s discuss what is really going on and when one or the other statistic is the relevant one.

When I analyze data, I model the data as having a random component. That random component results in the numbers I calculate from the data having a random component. For instance, I model

        y_j   =  x_j * b  +  u_j   ,       u_j   ~  N(0, sigma²)

In my model, the y_j values I observe have the random component u_j, Gaussian noise. If it were not for the random component in the y_j process, I could estimate b by solving

        b = y_j  / x_j                 for any j

In my data, however, u_j is not always equal to zero, so the formula is thus

        b = (y_j  - u_j) / x_j           for any j

I cannot make that calculation, because I do not observe u_j; all I know about u_j is that it is normally distributed with mean 0. So, I come up with another estimator of b, in particular

        b_hat  =  (X'X)^-1  X'y

and, concerning that calculated value, I am uncertain that I have it exactly right. Understand that the true b itself is not random. The true b is just a fixed number such as 3. My estimate of the true b, however, is random, by which I mean only that b_hat has a distribution. The true b might be 3, but I might calculate b_hat = 2.5.

Thus b_hat has a distribution that arises because of the random component, u_j, in the original model. This distribution is called the sampling distribution of b_hat.

Now, it turns out that the sampling distribution of b_hat is a function of N, the number of observations. It also turns out that, as N→infinity, the sampling distribution becomes (jointly) normal. That turns out to be true for lots of estimators. In any case, let's call the sampling distribution of b_hat f(b_hat, N), and let g(b_hat) represent the asymptotic sampling distribution of b_hat,

        g(b_hat) =    lim    f(b_hat, N)
                     N→inf

g(b_hat) being normal leads to test statistics with the normal distribution for single parameters and, correspondingly, tests with the chi-squared distribution when testing multiple parameters simultaneously.

However, no one really estimates models on asymptotic samples. Your sample has a finite number of observations. Thus rather than using g(b_hat), you would rather use f(b_hat, N) if only you knew it. For most estimators, we do not know f(b_hat, N), so we use g(b_hat), and we do tests using normals and chi-squareds.

In some cases—such as linear regression—we do know the sampling distribution for finite samples and, in those cases, we can calculate a test with better coverage probabilities.

Thus the Wald test is usually discussed as a chi-squared test, because it is usually applied to problems where only the asymptotic sampling distribution is known. But if we do know the sampling distribution for finite samples, we certainly want to use that. More details on how and when the test command report chi-squared or F statistics can be found in the "Methods and formulas" section of [R] test.

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Why does the test command sometimes produce chi-squared and other times F statistics?

How are the chi-squared and F distributions related?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

Why does the test command sometimes produce chi-squared and other times F statistics?

How are the chi-squared and F distributions related?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies