Home  /  Resources & support  /  FAQs  /  Estimating robust standard errors in Stata
Note: This FAQ is for users of releases prior to Stata 6. It is not relevant for more recent versions.

Why don’t the old huber results match the new robust versions?

Title   Estimating robust standard errors in Stata
Author James Hardin, StataCorp

The new versions are better (less biased).

In the new implementation of the robust estimate of variance, Stata is now scaling the estimated variance matrix in order to make it less biased.

Unclustered data

Estimating robust standard errors in Stata 4.0 resulted in

 . hreg price weight displ
 
 Regression with Huber standard errors               Number of obs    =      74
                                                     R-squared        =  0.2909
                                                     Adj R-squared    =  0.2710
                                                     Root MSE         = 2518.38
 
 ------------------------------------------------------------------------------
    price |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
   weight |   1.823366   .7648832      2.384   0.020       .2982323      3.3485
    displ |   2.087054   7.284658      0.286   0.775      -12.43814    16.61225
    _cons |    247.907   1106.467      0.224   0.823      -1958.326     2454.14
 ------------------------------------------------------------------------------

and the same model in Stata 5.0 is

 . regress price weight displ, robust
    
 Regression with robust standard errors                 Number of obs =      74
                                                        F(  2,    71) =   14.44
                                                        Prob > F      =  0.0000
                                                        R-squared     =  0.2909
                                                        Root MSE      =  2518.4
 
 ------------------------------------------------------------------------------
          |               Robust
    price |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
   weight |   1.823366   .7808755      2.335   0.022       .2663446    3.380387
    displ |   2.087054   7.436967      0.281   0.780      -12.74184    16.91595
    _cons |    247.907   1129.602      0.219   0.827      -2004.454    2500.269
 ------------------------------------------------------------------------------

Stata 5.0 scales the variance matrix using

         n  
       -----
       n - k

for the (unclustered) regression results. To match the previous results, we can undo that scaling

 . di .7808755*sqrt(71/74)
 .76488318
 
 . di 7.436967*sqrt(71/74)
 7.284658
 
 . di 1129.602*sqrt(71/74)
 1106.4678

Clustered data

Running a robust regression in Stata 4.0 results in

 . hreg price weight displ, group(rep78)
 
 Regression with Huber standard errors               Number of obs    =      69
                                                     R-squared        =  0.3108
                                                     Adj R-squared    =  0.2899
                                                     Root MSE         = 2454.21
 Grouping variable: rep78
 ------------------------------------------------------------------------------
    price |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
   weight |   1.039647   .8439705      1.232   0.222      -.6453948    2.724688
    displ |   8.887734   7.450619      1.193   0.237      -5.987907    23.76337
    _cons |   1234.034   1986.931      0.621   0.537      -2733.002    5201.069
 ------------------------------------------------------------------------------

The same model run in Stata 5.0 results in

 .  regress price weight displ, robust cluster(rep78)
 
 Regression with robust standard errors                 Number of obs =      69
                                                        F(  2,     4) =    3.40
                                                        Prob > F      =  0.1372
                                                        R-squared     =  0.3108
 Number of clusters (rep78) = 5                         Root MSE      =  2454.2
 
 ------------------------------------------------------------------------------
          |               Robust
    price |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
 ---------+--------------------------------------------------------------------
   weight |   1.039647   .9577778      1.085   0.339      -1.619571    3.698864
    displ |   8.887734   8.455317      1.051   0.353      -14.58799    32.36346
    _cons |   1234.034   2254.864      0.547   0.613      -5026.472    7494.539
 ------------------------------------------------------------------------------

To match the previous results, the scale factor for clustered data is

        n - 1         g  
        -----   x   -----
        n - k       g - 1

so that if we wish to match the previous results we may

 . di .9577778*sqrt(4/5)*sqrt(66/68)
 .84397051
 
 . di 8.455317*sqrt(4/5)*sqrt(66/68)
 7.4506198
 
 . di 2254.864*sqrt(4/5)*sqrt(66/68)
 1986.9313

Note also that Stata 5.0 includes an F test in the header of the output that is the Wald test based on the robust variance estimate.

There is one final important difference. The hreg command used n-1 as the degrees of freedom for the t tests of the coefficients. This is anticonservative as Stata 5.0 now uses g-1 as the degrees of freedom. The more conservative definition of the degrees of freedom provides much more accurate confidence intervals. So for a dataset with a small number of groups (clusters) and a large number of observations, the difference between regress, robust cluster() and the old hreg will show up in the p-values of the t-statistics as the scale factor will become much less important, but the difference in degrees of freedom will remain important.