Stata | FAQ: Clarification on analytic weights with linear regression

Home / Resources & support / FAQs / Clarification on analytic weights with linear regression

Note: The following originally appeared in Stata Technical Bulletin, issue 20, July 1994.

What is the effect of specifying aweights with regress?

Title		Clarification on analytic weights with linear regression
Author		William Gould, StataCorp

Clarification on analytic weights with linear regression

A popular request on the help line is to describe the effect of specifying [aweight=exp] with regress in terms of transformation of the dependent and independent variables. The mechanical answer is that typing

        . regress y x_1 x_2> [aweight=n]

is equivalent to estimating the model:

\[ y_j \sqrt{n_j} = \beta_o \sqrt{n_j} + \beta_1 x_{1 j} \sqrt{n_j} + \beta_2 x_{2j} \sqrt{n_j} + u_j \sqrt{n_j} \]

This regression will reproduce the coefficients and covariance matrix produced by the aweighted regression. The mean square errors (estimate of the variance of the residuals) will, however, be different. The transformed regression reports \( s_t^2 \), an estimate of Var\( (u_j \sqrt{n_j}) \). The aweighted regression reports \( s_a^2 \), an estimate of Var\( (u_j \sqrt{n_j} \sqrt{N/\sum_k n_k}) \), where N is the number of observations. Thus,

\[ s_a^2 = \frac{N}{\sum_k n_k} s_t^2 = \frac{s_t^2}{\bar{n}} \qquad \qquad \qquad (1) \]

The logic for this adjustment is as follows: Consider the model:

\[ y = \beta_o + \beta_1 x_1 + \beta_2 x_2 + u \]

Assume that, were this model estimated on individuals, Var(u)=\( \sigma _u^2 \), a constant. Assume that individual data are not available; what is available are averages \( (\bar{y}_j, \bar{x}_{1 j}, \bar{x}_{2j}) \), for j = 1,...,N, and that each average is calculated over \( n_j \) observations. Then it is still true that

\[ \bar{y}_j = \beta_o + \beta_1 \bar{x}_{1 j} + \beta_2 \bar{x}_{2j} + \bar{u}_j \]

where \( \bar{u}_j \) is the average of \( n_j \) mean 0, variance \( \sigma _u^2 \) deviates, and so has variance \( \sigma _u^2 = \sigma _u^2/n_j \). Thus, multiplying through by \( \sqrt{n_j} \) produces

\[ \bar{y}_j \sqrt{n_j} = \beta_o \sqrt{n_j} + \beta_1 \bar{x}_{1 j} \sqrt{n_j} + \beta_2 \bar{x}_{2 j} \sqrt{n_j} + \bar{u}_j \sqrt{n_j} \]

and Var\( (u_j \sqrt{n_j}) = \sigma _u^2 \). The mean square error \( s_t^2 \) reported by estimating this transformed regression is an estimate of \( \sigma _u^2 \). Alternatively, the coefficients and covariance matrix could be obtained by aweighted regress. The only difference would be in the reported mean square error, which per equation 1 is \( \sigma _u^2 / \: \bar{n} \). On average, each observation in the data reflects the averages calculated over \( \bar{n} = \sum_k n_k/N \) individuals, and thus this reported mean square error is the average variance of an observation in the dataset. One can retrieve the estimate of \( \sigma _u^2 \) by multiplying the reported mean square error by \( \bar{n} \).

More generally, aweights are used to solve general heteroskedasticity problems. In these cases, one has the model

\[ y_j = \beta_o + \beta_1 x_{1 j} + \beta_2 x_{2j} + u_j \]

and the variance of \( u_j \) is thought to be proportional to \( a_j \). If the variance is proportional to \( a_j \), it is also proportional to \( \alpha a_j \), where \( \alpha \) is any positive constant. Not quite arbitrarily, but with no loss of generality, let us choose \( \alpha = \sum_k(1/a_k)/N \), the average value of the inverse of \( a_j \). We can then write Var\((u_j) = k\alpha a_j\sigma^2 \), where \( k \) is the constant of proportionality that is no longer a function of the scale of the weights.

Dividing this regression through by the \( \sqrt{a_j} \),

\[ y_j/\sqrt{a_j} = \beta_o/\sqrt{a_j} + \beta_1 x_{1 j}/\sqrt{a_j} + \beta_2 x_{2j}/\sqrt{a_j} + u_j/\sqrt{a_j} \]

produces a model with Var\( (u_j/\sqrt{a_j}) = k\alpha \sigma^2 \), which is the constant part of Var\( (u_j )\). Notice in particular that this variance is a function of \( \alpha \), the average of the reciprocal weights. If the weights are scaled arbitrarily, then so is this variance.

We can also estimate this model by typing:

        . regress y x_1 x_2 [aweight=1/a]

This command will produce the same estimates of the coefficients and covariance matrix; the reported mean square error is, per equation 1, \( [ N/ \sum_k (1/a_k) ] k \alpha \sigma^2 = k \sigma^2 \). This variance is independent of the scale of \( a_j \).

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

What is the effect of specifying aweights with regress?

Clarification on analytic weights with linear regression

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

What is the effect of specifying aweights with regress?

Clarification on analytic weights with linear regression

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies