StataNow spotlight: Teaching old commands new tricks
Linear regression has been a core part of Stata since the beginning. Stata's workhorse regression commands, including regress, xtreg, and areg, remain staples of applied work today.
In many ways, these commands have not changed. Users today can run the same regressions they ran in early versions of Stata, using the same syntax, and get the same results. Even so, as with every part of Stata, these commands are actively maintained by Stata's developers. We frequently add new functionality to keep up with advances in applied practice or to meet user demand.
Over the last few years, we have introduced dozens of enhancements to variance–covariance estimation in linear regression commands. Some of these changes were made in Stata 18 and Stata 19. Some are new to StataNow 19.5. In case you missed them, here are some of the recent highlights.
Hansen standard errors and inference adjustment
The state-of-the-art jackknife method for standard errors and confidence intervals outlined in Hansen (2025) has been added to regress, xtreg, fe, areg, didregress, and xtdidregress in StataNow 19.5.
Hansen's method gives standard errors equivalent to vce(hc3 [clustvar]) when cluster matrices are invertible and has better coverage than HC3 standard errors when they are not. Common causes of cluster noninvertibility are singleton clusters, observations with extreme leverage, and the presence of cluster-level fixed effects. Apart from standard errors, Hansen's method adjusts reported p-values and confidence intervals, which improves finite-sample inference.
To show how it works, we follow Hansen (2025) in applying the method to data from Card and Krueger (1994), as retrieved from https://users.ssc.wisc.edu/~behansen/:
. use https://www.stata.com/exdata/CK.dta, clear . list store state time treatment in 63/66
| store state time treatment | |
| 63. | 36 New Jersey Before Control |
| 64. | 36 New Jersey After Treatment |
| 65. | 37 Pennsylvania Before Control |
| 66. | 37 Pennsylvania After Control |
This dataset is a panel of restaurants in Pennsylvania and New Jersey covering two time periods: before and after a wage change in New Jersey. The treatment variable (treatment) is the interaction between an indicator for the state of New Jersey (state) and an indicator for the time period after the wage change (time), indicating the observations that experienced the wage change in New Jersey. We regress worker hours (fte) on treatment, state, and time:
. regress fte treatment state time, vce(cluster region)
Linear regression Number of obs = 768
F(2, 4) = .
Prob > F = .
R-squared = 0.0076
Root MSE = 9.5113
(Std. err. adjusted for 5 clusters in region)
| Robust | ||
| fte | Coefficient std. err. t P>|t| [95% conf. interval] | |
| treatment | 2.75 1.17263 2.35 0.079 -.5057439 6.005744 | |
| state | -2.949417 1.891643 -1.56 0.194 -8.201459 2.302624 | |
| time | -2.283333 1.137836 -2.01 0.115 -5.442474 .8758071 | |
| _cons | 23.38 1.047288 22.32 0.000 20.47226 26.28774 | |
Here the confidence interval on treatment extends to only a small negative effect of −0.51. If we use Hansen standard errors and inference adjustment, however, we get substantially different results. To use them, we add the hansen suboption to vce(hc3 [clustvar], dfadjust):
. regress fte treatment state time, vce(hc3 region, dfadjust hansen)
Computing degrees of freedom ...
Linear regression Number of obs = 768
F(3, 4) = 1.06
Prob > F = 0.4602
R-squared = 0.0076
Adj R-squared = 0.0037
Root MSE = 9.5113
(Std. err. adjusted for 5 clusters in region)
| Robust | ||
| fte | Coefficient std. err. t P>|t| [95% conf. interval] | |
| treatment | 2.75 2.094625 1.31 0.255 -6.980507 12.48051 | |
| state | -2.949417 3.014157 -0.98 0.346 -16.95157 11.05274 | |
| time | -2.283333 2.058197 -1.11 0.359 -20.61511 16.04844 | |
| _cons | 23.38 1.894408 12.34 0.036 6.507051 40.25295 | |
Note that the confidence interval on the treatment effect has widened considerably to [−6.98, 12.48], so a large negative effect is plausible.
Heteroskedasticity- and autocorrelation-consistent standard errors
Stata has long been able to estimate Newey–West heteroskedasticity- and autocorrelation-consistent (HAC) standard errors for linear regression by use of the newey command. In StataNow 19.5, you can simply use the vce(hac) option in regress or areg.
The following two commands will return the same results:
. newey y x1 x2, lag(10) . regress y x1 x2, vce(hac nwest 10)
The syntax of vce(hac) is the same as in ivregress. You can choose between the Newey–West (Bartlett), Gallant (Parzen), or quadratic spectral kernels. You can also choose the number of lags or ask for an optimal number of lags by using the Newey–West criterion:
. regress y x1 x2, vce(hac quadraticspectral opt)
The integration of HAC standard errors into regress also means that a broader set of return values, postestimation commands, and other extended functionalities are available while estimating Newey–West standard errors, relative to using newey. For example, newey does not report regression \(R^2\).
Driscoll–Kraay standard errors
In StataNow 19.5, the covariance matrix estimation method of Driscoll and Kraay (1998) is now available in xtreg, fe with the vce(dkraay) option. Driscoll–Kraay standard errors have become popular because they are robust not only to heteroskedasticity and autocorrelation but also to many forms of dependence between panels (see Driscoll and Kraay [1998] for details). They can be thought of as a generalization of HAC standard errors for panel data.
We have a simple dataset of the growth in average rents in Canadian metropolitan areas from 2001 to 2025, using data retrieved from Statistics Canada (see Note section at the end of this article):
. use https://www.stata.com/exdata/canhousing.dta, clear
(Cdn housing, unemployment, and pop. data, 2001-2025, from Statistics Canada)
. describe
Contains data from canhousing.dta
Observations: 825 Cdn housing, unemployment, and
pop. data, 2001-2025, from
Statistics Canada
Variables: 11 27 May 2026 15:32
(_dta has notes)
| Variable Storage Display Value | ||
| name type format label Variable label | ||
| region byte %16.0g region Region cma byte %37.0g cma Census metropolitan area year int %8.0g Year rent float %8.0g Average two-bedroom apartment rent (thousands) vacancy float %9.0g Rental vacancy rate population long %12.0g Population unemp float %9.0g * Unemployment rate rrent float %9.0g Average two-bedroom apartment rent (thousands of 2002 dollars) lpop float %9.0g Log population popgrowth float %9.0g Population growth rate rentgrowth float %9.0g Growth rate of real average rent * indicated variables have notes | ||
We perform a fixed-effects panel regression of rent growth on population growth (popgrowth), vacancy rates (vacancy), and unemployment rates (unemp). We may suspect that shocks to rents in Canadian cities are autocorrelated from year to year, at least in the short run. We can use Driscoll–Kraay standard errors with a Newey–West kernel, where autocorrelation is corrected for up to two lags, to produce standard errors that will be robust to this autocorrelation:
. xtreg rentgrowth vacancy popgrowth unemp, fe vce(dkraay nwest 2)
Fixed-effects (within) regression Number of obs = 792
Group variable: cma Number of groups = 33
R-squared: Obs per group:
Within = 0.1771 min = 24
Between = 0.0974 avg = 24.0
Overall = 0.1496 max = 24
F(3, 23) = 30.87
corr(u_i, Xb) = -0.3860 Prob > F = 0.0000
| DK robust | ||
| rentgrowth | Coefficient std. err. t P>|t| [95% conf. interval] | |
| vacancy | -.004705 .0010709 -4.39 0.000 -.0069203 -.0024896 | |
| popgrowth | .8572707 .1774483 4.83 0.000 .490191 1.22435 | |
| unemp | -.0005674 .0025341 -0.22 0.825 -.0058095 .0046748 | |
| _cons | .0206695 .0157583 1.31 0.203 -.0119291 .0532681 | |
| sigma_u | .00724556 | |
| sigma_e | .02863563 | |
| rho | .0601699 (fraction of variance due to u_i) | |
Looking at the p-values and confidence intervals, we have evidence that both vacancy rates and population growth are related to rent growth in Canadian cities over this period, even when accounting for autocorrelation across years and broad forms of correlation across metropolitan areas.
Note that Driscoll–Kraay standard errors, like HAC standard errors, rely on asymptotics that take effect as the number of time periods grows large and thus may not be appropriate in panel datasets with few time periods.
And more!
In addition to the enhancements highlighted above, we have made the functionality of the vce(hc2) and vce(hc3) options more consistent across our linear regression commands in StataNow 19.5.
- vce(hc3) in regress now allows clustering, as vce(hc2) had previously been extended to allow.
- areg and xtreg, fe now allow for HC3 standard errors using vce(hc3) or vce(hc3 clustvar), in addition to allowing for HC2.
- didregress and xtdidregress now allow user-specified clustering with vce(hc2 clustvar) and vce(hc3 clustvar) provided that the group variable is nested in the specified clusters.
We have also added new functionality to two other commands:
- ivregress now allows for multiway clustering using vce(cluster clustvarlist). This follows the addition of multiway clustering to regress, areg, and xtreg, fe in Stata 18.
- xtgls now allows for arbitrary correlation within panels via the new corr(unstructured) option.
Read more
Read more in the Stata Base Reference Manual; see [R] regress, [R] areg, and [R] ivregress.
Read more in the Stata Longitudinal-Data/Panel-Data Reference Manual; see [XT] xtreg, fe and [XT] xtgls.
Read more in the Stata Causal Inference and Treatment-Effects Estimation Reference Manual; see [CAUSAL] didregress and [CAUSAL] xtdidregress.
Note
Data on Canadian metropolitan areas are adapted from Statistics Canada tables 34-10-0133-01, 34-10-0127-01, 17-10-0148-01, 14-10-0461-01, 14-10-0096-01, and 18-10-0005-01, as they appeared on May 26, 2026. These data are used under the Statistics Canada Open Licence; this does not constitute an endorsement by Statistics Canada.
References
Card, D., and A. B. Krueger. 1994. Minimum wages and employment: A case study of the fast-food industry in New Jersey and Pennsylvania. American Economic Review 84: 772–793.
Driscoll, J. C., and A. C. Kraay. 1998. Consistent covariance matrix estimation with spatially dependent panel data. Review of Economics and Statistics 80: 549–560.
Hansen, B. E. 2025. Standard errors for difference-in-difference regression. Journal of Applied Econometrics 40: 291–309.
— Tom Stringham
Senior Econometrician and Software Developer