Search
   >> Home >> Resources & support >> FAQs >> Clarification on analytic weights with linear regression
The following originally appeared in Stata Technical Bulletin, issue 20, July 1994.

What is the effect of specifying aweights with regress?

Title   Clarification on analytic weights with linear regression
Author William Gould, StataCorp
Date January 1999

Clarification on analytic weights with linear regression

A popular request on the help line is to describe the effect of specifying [aweight=exp] with regress in terms of transformation of the dependent and independent variables. The mechanical answer is that typing

        . regress y x_1 x_2 [aweight=n]

is equivalent to estimating the model:

equation

This regression will reproduce the coefficients and covariance matrix produced by the aweighted regression. The mean square errors (estimate of the variance of the residuals) will, however, be different. The transformed regression reports equation, an estimate of Var(equation). The aweighted regression reports equation, an estimate of Var(equation), where N is the number of observations. Thus,

equation

The logic for this adjustment is as follows: Consider the model:

equation

Assume that, were this model estimated on individuals, Var(u)=equation, a constant. Assume that individual data are not available; what is available are averages equation, for j = 1,...,N, and that each average is calculated over equation observations. Then it is still true that

equation

where equation is the average of equation mean 0, variance equation deviates, and so has variance equation. Thus, multiplying through by equation produces

equation

and Var(equation)=equation. The mean square error equation reported by estimating this transformed regression is an estimate of equation. Alternatively, the coefficients and covariance matrix could be obtained by aweighted regress. The only difference would be in the reported mean square error, which per equation 1 is equation. On average, each observation in the data reflects the averages calculated over equation individuals, and thus this reported mean square error is the average variance of an observation in the dataset. One can retrieve the estimate of equation by multiplying the reported mean square error by equation.

More generally, aweights are used to solve general heteroskedasticity problems. In these cases, one has the model

equation

and the variance of equation is thought to be proportional to equation. If the variance is proportional to equation, it is also proportional to equation, where equation is any positive constant. Not quite arbitrarily, but with no loss of generality, let us choose equation, the average value of the inverse of equation. We can then write Var(equation) = equation, where k is the constant of proportionality that is no longer a function of the scale of the weights.

Dividing this regression through by the equation,

equation

produces a model with Var(equation) = equation, which is the constant part of Var(equation). Notice in particular that this variance is a function of equation, the average of the reciprocal weights. If the weights are scaled arbitrarily, then so is this variance.

We can also estimate this model by typing:

        . regress y x_1 x_2 [aweight=1/a]

This command will produce the same estimates of the coefficients and covariance matrix; the reported mean square error is, per equation 1, equation. This variance is independent of the scale of equation.

The Stata Blog: Not Elsewhere Classified Find us on Facebook Follow us on Twitter LinkedIn Google+ Watch us on YouTube