 »  Home »  Resources & support »  FAQs »  Estimating robust standard errors in Stata
Note: This FAQ is for users of releases prior to Stata 6. It is not relevant for more recent versions.

## Why don’t the old huber results match the new robust versions?

 Title Estimating robust standard errors in Stata Author James Hardin, StataCorp

The new versions are better (less biased).

In the new implementation of the robust estimate of variance, Stata is now scaling the estimated variance matrix in order to make it less biased.

### Unclustered data

Estimating robust standard errors in Stata 4.0 resulted in

 . hreg price weight displ

Regression with Huber standard errors               Number of obs    =      74
R-squared        =  0.2909
Root MSE         = 2518.38

------------------------------------------------------------------------------
price |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
weight |   1.823366   .7648832      2.384   0.020       .2982323      3.3485
displ |   2.087054   7.284658      0.286   0.775      -12.43814    16.61225
_cons |    247.907   1106.467      0.224   0.823      -1958.326     2454.14
------------------------------------------------------------------------------


and the same model in Stata 5.0 is

 . regress price weight displ, robust

Regression with robust standard errors                 Number of obs =      74
F(  2,    71) =   14.44
Prob > F      =  0.0000
R-squared     =  0.2909
Root MSE      =  2518.4

------------------------------------------------------------------------------
|               Robust
price |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
weight |   1.823366   .7808755      2.335   0.022       .2663446    3.380387
displ |   2.087054   7.436967      0.281   0.780      -12.74184    16.91595
_cons |    247.907   1129.602      0.219   0.827      -2004.454    2500.269
------------------------------------------------------------------------------


Stata 5.0 scales the variance matrix using

         n
-----
n - k


for the (unclustered) regression results. To match the previous results, we can undo that scaling

 . di .7808755*sqrt(71/74)
.76488318

. di 7.436967*sqrt(71/74)
7.284658

. di 1129.602*sqrt(71/74)
1106.4678


### Clustered data

Running a robust regression in Stata 4.0 results in

 . hreg price weight displ, group(rep78)

Regression with Huber standard errors               Number of obs    =      69
R-squared        =  0.3108
Root MSE         = 2454.21
Grouping variable: rep78
------------------------------------------------------------------------------
price |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
weight |   1.039647   .8439705      1.232   0.222      -.6453948    2.724688
displ |   8.887734   7.450619      1.193   0.237      -5.987907    23.76337
_cons |   1234.034   1986.931      0.621   0.537      -2733.002    5201.069
------------------------------------------------------------------------------


The same model run in Stata 5.0 results in

 .  regress price weight displ, robust cluster(rep78)

Regression with robust standard errors                 Number of obs =      69
F(  2,     4) =    3.40
Prob > F      =  0.1372
R-squared     =  0.3108
Number of clusters (rep78) = 5                         Root MSE      =  2454.2

------------------------------------------------------------------------------
|               Robust
price |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
weight |   1.039647   .9577778      1.085   0.339      -1.619571    3.698864
displ |   8.887734   8.455317      1.051   0.353      -14.58799    32.36346
_cons |   1234.034   2254.864      0.547   0.613      -5026.472    7494.539
------------------------------------------------------------------------------


To match the previous results, the scale factor for clustered data is

        n - 1         g
-----   x   -----
n - k       g - 1


so that if we wish to match the previous results we may

 . di .9577778*sqrt(4/5)*sqrt(66/68)
.84397051

. di 8.455317*sqrt(4/5)*sqrt(66/68)
7.4506198

. di 2254.864*sqrt(4/5)*sqrt(66/68)
1986.9313


Note also that Stata 5.0 includes an F test in the header of the output that is the Wald test based on the robust variance estimate.

There is one final important difference. The hreg command used n-1 as the degrees of freedom for the t tests of the coefficients. This is anticonservative as Stata 5.0 now uses g-1 as the degrees of freedom. The more conservative definition of the degrees of freedom provides much more accurate confidence intervals. So for a dataset with a small number of groups (clusters) and a large number of observations, the difference between regress, robust cluster() and the old hreg will show up in the p-values of the t-statistics as the scale factor will become much less important, but the difference in degrees of freedom will remain important.