In the spotlight: So many fixed effects, so little time
Do you work with panel data or cross-sectional data and need to control for high-dimensional categorical variables? Stata 19 introduced powerful enhancements to its regression capabilities that make it easier and much faster to fit models with high-dimensional fixed effects (HDFE).
The absorb() option is now available in xtreg, fe and ivregress 2sls, letting you efficiently absorb multiple categorical variables at the same time. This greatly expands what’s possible with fixed-effects modeling. In areg, absorb() has also been enhanced to handle multiple categorical variables. And because absorbed effects are omitted from the output, you avoid results you don’t need while still getting fast and efficient estimation.
An example using individual-level panel data
Let’s walk through a concrete example using individual-level panel data from the IPUMS Current Population Survey (CPS), available at https://cps.ipums.org/cps/. The dataset contains repeated observations of individuals over time, with information on each individual's occupation and industry at each time period.
We begin by loading and describing the dataset.
. use ipums (IPUMS CPS extract, 2015 through 2024) . describe Contains data from ipums.dta Observations: 170,434 IPUMS CPS extract, 2015 through 2024 Variables: 21 4 Sep 2025 12:57
Variable Storage Display Value | ||
name type format label Variable label | ||
id float %9.0g Individual identification | ||
year int %8.0g Survey year | ||
cpsid double %12.0g Household record | ||
asecflag byte %8.0g ASECFLAG Annual Social and Economic Supplement | ||
region byte %8.0g REGION Region and division | ||
statecensus byte %8.0g STATECENSUS | ||
State census code | ||
cpsidp double %12.0g Person record | ||
age byte %8.0g Age | ||
sex byte %8.0g SEX Sex | ||
race int %8.0g RACE Race | ||
marst byte %23.0g MARST Marital status | ||
vetstat byte %8.0g VETSTAT Veteran status | ||
empstat byte %30.0g EMPSTAT Employment status | ||
occ int %8.0g Occupation | ||
ind int %8.0g Industry | ||
uhrsworkt int %8.0g Hours usually worked per week at all jobs | ||
educ int %8.0g EDUC Educational attainment recode | ||
wksunem2 byte %8.0g Weeks unemployed last year, intervaled | ||
incwage long %12.0g Wage and salary income | ||
health byte %9.0g HEALTH Health status | ||
lnwage float %9.0g Natural log of wage and salary income | ||
We aim to model the log of wages (lnwage) as a function of age, employment status, marital status, hours worked, weeks unemployed, and health condition. We also want to control for the individual, time period, occupation, and industry.
Step 1: Start simple—no fixed effects
First, we establish a baseline by fitting a basic ordinary least-squares regression without any fixed effects. This model ignores any unobserved characteristics that are constant across individuals, years, occupations, and industries.
. regress lnwage c.age##c.age uhrsworkt wksunem2 i.marst i.empstat i.health (Output omitted)
We use estimates store to store our estimation results so that we can use them later.
. estimates store OLS
Step 2: One-way fixed effects with xtreg, fe
Next, we improve our model by including individual fixed effects using xtreg, fe. These individual fixed effects control for all time-invariant and unobserved factors, such as innate ability or motivation. As a prerequisite, we first declare the panel structure, an essential step before using xt commands like xtreg.
. xtset id year Panel variable: id (unbalanced) Time variable: year, 2015 to 2024, but with gaps Delta: 1 unit . xtreg lnwage c.age##c.age uhrsworkt wksunem2 i.marst i.empstat i.health, fe nolog Fixed-effects (within) regression Number of obs = 80,680 Group variable: id Number of groups = 19,021 R-squared: Obs per group: Within = 0.3634 min = 1 Between = 0.3598 avg = 4.2 Overall = 0.3623 max = 10 F(20, 61639) = 1759.30 corr(u_i, Xb) = -0.0069 Prob > F = 0.0000
lnwage | Coefficient Std. err. t P>|t| [95% conf. interval] | |
age | .1157152 .0016753 69.07 0.000 .1124316 .1189989 | |
c.age#c.age | -.0011639 .0000182 -63.88 0.000 -.0011996 -.0011282 | |
uhrsworkt | -.0000866 .0000179 -4.83 0.000 -.0001216 -.0000515 | |
wksunem2 | .1025737 .0012164 84.33 0.000 .1001895 .1049578 | |
marst | ||
Married, spouse absent | -.19975 .031977 -6.25 0.000 -.2624251 -.137075 | |
Separated | -.3952554 .0276942 -14.27 0.000 -.4495361 -.3409746 | |
Divorced | -.2076364 .0131721 -15.76 0.000 -.2334538 -.181819 | |
Widowed | -.3332016 .0287973 -11.57 0.000 -.3896444 -.2767588 | |
Never married/single | -.2313562 .0104761 -22.08 0.000 -.2518894 -.2108231 | |
empstat | ||
At work | -.3255529 .0495945 -6.56 0.000 -.4227582 -.2283477 | |
Has job, not at work last week | -.3600129 .0541405 -6.65 0.000 -.4661284 -.2538974 | |
Unemployed, experienced worker | -.6734508 .0517662 -13.01 0.000 -.7749126 -.571989 | |
Unemployed, new worker | -1.190159 .2537126 -4.69 0.000 -1.687436 -.6928814 | |
NILF, unable to work | -.9811457 .074068 -13.25 0.000 -1.126319 -.8359722 | |
NILF, other | -1.125918 .0505666 -22.27 0.000 -1.225029 -1.026807 | |
NILF, retired | -.6796395 .0570466 -11.91 0.000 -.7914509 -.5678281 | |
health | ||
Very good | -.0600062 .0092826 -6.46 0.000 -.0782002 -.0418122 | |
Good | -.1977527 .010219 -19.35 0.000 -.2177819 -.1777235 | |
Fair | -.3452917 .0176644 -19.55 0.000 -.3799139 -.3106695 | |
Poor | -.3686544 .0400975 -9.19 0.000 -.4472456 -.2900632 | |
_cons | 7.719378 .0613174 125.89 0.000 7.599196 7.83956 | |
sigma_u | .52868441 | |
sigma_e | .93476059 | |
rho | .24235753 (fraction of variance due to u_i) | |
. estimates store Oneway
As shown by the joint F test, we do not find evidence that we need to control for individual fixed effects.
Step 3: Two-way fixed effects with absorb()
We extend the model by adding year fixed effects through the absorb() option, capturing time-specific unobservables, such as macroeconomic conditions and policy changes. This yields a two-way fixed-effects specification.
. xtreg lnwage c.age##c.age uhrsworkt wksunem2 i.marst i.empstat i.health, fe absorb(year) nolog Alternating projection maximum absolute difference = 8.707e-09 Fixed-effects (within) regression Number of obs = 80,680 Group variable: id Number of groups = 19,021 R-squared: Obs per group: Within = 0.3761 min = 1 Between = 0.3595 avg = 4.2 Overall = 0.3622 max = 10 F(20, 61630) = 1792.10 corr(u_i, Xb) = -0.0077 Prob > F = 0.0000
Absorbed variable | Levels | |
year | 10 | |
lnwage | Coefficient Std. err. t P>|t| [95% conf. interval] | |
age | .1155189 .0016591 69.63 0.000 .1122671 .1187707 | |
c.age#c.age | -.0011632 .000018 -64.47 0.000 -.0011985 -.0011278 | |
uhrsworkt | -.00008 .0000177 -4.51 0.000 -.0001148 -.0000453 | |
wksunem2 | .1025185 .0012062 85.00 0.000 .1001545 .1048826 | |
marst | ||
Married, spouse absent | -.2096018 .0316645 -6.62 0.000 -.2716643 -.1475393 | |
Separated | -.3875132 .027422 -14.13 0.000 -.4412603 -.333766 | |
Divorced | -.2020962 .0130424 -15.50 0.000 -.2276593 -.1765331 | |
Widowed | -.3300978 .0285125 -11.58 0.000 -.3859824 -.2742133 | |
Never married/single | -.2390184 .0103748 -23.04 0.000 -.2593532 -.2186837 | |
empstat | ||
At work | -.3119542 .0491056 -6.35 0.000 -.4082012 -.2157072 | |
Has job, not at work last week | -.3477729 .0536078 -6.49 0.000 -.4528442 -.2427016 | |
Unemployed, experienced worker | -.6544863 .0512603 -12.77 0.000 -.7549566 -.5540161 | |
Unemployed, new worker | -1.141233 .2512077 -4.54 0.000 -1.6336 -.6488649 | |
NILF, unable to work | -.9626481 .073338 -13.13 0.000 -1.106391 -.8189054 | |
NILF, other | -1.114173 .0500685 -22.25 0.000 -1.212307 -1.016039 | |
NILF, retired | -.6796395 .0570466 -11.91 0.000 -.7914509 -.5678281 | |
health | ||
Very good | -.0698841 .0091966 -7.60 0.000 -.0879094 -.0518587 | |
Good | -.2106014 .010126 -20.80 0.000 -.2304485 -.1907544 | |
Fair | -.3599402 .0174966 -20.57 0.000 -.3942335 -.3256469 | |
Poor | -.3807171 .0397028 -9.59 0.000 -.4585347 -.3028994 | |
_cons | 7.721496 .0607112 127.18 0.000 7.602502 7.840491 | |
sigma_u | .52355994 | |
sigma_e | .92547327 | |
rho | .24244753 (fraction of variance due to u_i) | |
. estimates store Twoway
Here individual-level fixed effects are captured automatically by the fe option (because xtset declared id as the panel dimension), while the absorb(year) option efficiently controls for all time effects. After including year fixed effects, the coefficients on the variables of interest do not change much, but we now have evidence that we should control for the individual fixed effects based on the joint F test at the bottom of the output.
Step 4: Go full HDFE
We also want to control for occupation and industry effects, so we add these variables in the absorb() option. If we were to add them as regressors, we would have almost 900 parameters to estimate. We do not care about the parameter estimates, but we need to control for the effects of, for instance, technological shifts in a particular industry or the unique skill demands of a specific occupation, which would bias our estimates if they were omitted.
. xtreg lnwage c.age##c.age uhrsworkt wksunem2 i.marst i.empstat i.health, fe absorb(year occ ind) nolog Alternating projection maximum absolute difference = 3.233e-09 Fixed-effects (within) regression Number of obs = 80,680 Group variable: id Number of groups = 19,021 R-squared: Obs per group: Within = 0.5089 min = 1 Between = 0.3542 avg = 4.2 Overall = 0.3564 max = 10 F(20, 60721) = 901.00 corr(u_i, Xb) = -0.0038 Prob > F = 0.0000
Absorbed variable | Levels | |
year | 10 | |
occ | 631 | |
ind | 280 | |
lnwage | Coefficient Std. err. t P>|t| [95% conf. interval] | |
age | .0868007 .0015374 56.46 0.000 .0837873 .0898141 | |
c.age#c.age | -.0008721 .0000167 -52.36 0.000 -.0009047 -.0008394 | |
uhrsworkt | -.0000613 .0000161 -3.82 0.000 -.0000928 -.0000298 | |
wksunem2 | .0861902 .0011058 77.94 0.000 .0840229 .0883576 | |
marst | ||
Married, spouse absent | -.0656064 .028564 -2.30 0.022 -.1215919 -.0096209 | |
Separated | -.1792339 .0247726 -7.24 0.000 -.2277884 -.1306795 | |
Divorced | -.0986945 .0118113 -8.36 0.000 -.1218447 -.0755443 | |
Widowed | -.1634434 .0257243 -6.35 0.000 -.2138631 -.1130237 | |
Never married/single | -.108238 .0094786 -11.42 0.000 -.1268161 -.0896598 | |
empstat | ||
At work | -.8150478 .0594023 -13.72 0.000 -.9314766 -.6986191 | |
Has job, not at work last week | -.8254109 .0624995 -13.21 0.000 -.94791 -.7029117 | |
Unemployed, experienced worker | -1.079656 .0608426 -17.75 0.000 -1.198907 -.9604038 | |
Unemployed, new worker | -1.555114 .2248653 -6.92 0.000 -1.99585 -1.114377 | |
NILF, unable to work | -1.200687 .0659515 -18.21 0.000 -1.329952 -1.071422 | |
NILF, other | -1.373513 .0451674 -30.41 0.000 -1.462042 -1.284985 | |
NILF, retired | -.8826988 .0509092 -17.34 0.000 -.982481 -.7829167 | |
health | ||
Very good | -.025719 .008297 -3.10 0.002 -.0419811 -.0094569 | |
Good | -.0958429 .0092073 -10.41 0.000 -.1138892 -.0777967 | |
Fair | -.1874372 .0158488 -11.83 0.000 -.2185008 -.1563736 | |
Poor | -.2139877 .0357415 -5.99 0.000 -.2840412 -.1439343 | |
_cons | 8.831137 .0669693 131.87 0.000 8.699876 8.962397 | |
sigma_u | .47258634 | |
sigma_e | .82718913 | |
rho | .2460807 (fraction of variance due to u_i) | |
. estimates store Fourway
This model now controls for fixed effects across four dimensions: individual, year, occupation, and industry.
Compare the results
We can compare the results across these models to see how our estimates change with additional fixed effects. We begin by creating a table to compare the coefficients.
. etable, estimates(OLS Oneway Twoway Fourway) column(estimates)
OLS Oneway Twoway Fourway | ||
Age 0.115 0.116 0.116 0.087 | ||
(0.001) (0.002) (0.002) (0.002) | ||
Age # Age -0.001 -0.001 -0.001 -0.001 | ||
(0.000) (0.000) (0.000) (0.000) | ||
Hours usually worked per week at all jobs -0.000 -0.000 -0.000 -0.000 | ||
(0.000) (0.000) (0.000) (0.000) | ||
Weeks unemployed last year, intervaled 0.102 0.103 0.103 0.086 | ||
(0.001) (0.001) (0.001) (0.001) | ||
Marital status | ||
Married, spouse absent -0.168 -0.200 -0.210 -0.066 | ||
(0.028) (0.032) (0.032) (0.029) | ||
Separated -0.413 -0.395 -0.388 -0.179 | ||
(0.024) (0.028) (0.027) (0.025) | ||
Divorced -0.202 -0.208 -0.202 -0.099 | ||
(0.012) (0.013) (0.013) (0.012) | ||
Widowed -0.317 -0.333 -0.330 -0.163 | ||
(0.025) (0.029) (0.029) (0.026) | ||
Never married/single -0.222 -0.231 -0.239 -0.108 | ||
(0.009) (0.010) (0.010) (0.009) | ||
Employment status | ||
At work -0.297 -0.326 -0.312 -0.815 | ||
(0.043) (0.050) (0.049) (0.059) | ||
Has job, not at work last week -0.361 -0.360 -0.348 -0.825 | ||
(0.047) (0.054) (0.054) (0.062) | ||
Unemployed, experienced worker -0.670 -0.673 -0.654 -1.080 | ||
(0.045) (0.052) (0.051) (0.061) | ||
Unemployed, new worker -1.494 -1.190 -1.141 -1.555 | ||
(0.219) (0.254) (0.251) (0.225) | ||
NILF, unable to work -0.935 -0.981 -0.963 -1.201 | ||
(0.065) (0.074) (0.073) (0.066) | ||
NILF, other -1.086 -1.126 -1.114 -1.374 | ||
(0.044) (0.051) (0.050) (0.045) | ||
NILF, retired -0.626 -0.680 -0.662 -0.883 | ||
(0.050) (0.057) (0.056) (0.051) | ||
Health status | ||
Very good -0.059 -0.060 -0.070 -0.026 | ||
(0.008) (0.009) (0.009) (0.008) | ||
Good -0.190 -0.198 -0.211 -0.096 | ||
(0.009) (0.010) (0.010) (0.009) | ||
Fair -0.354 -0.345 -0.360 -0.187 | ||
(0.015) (0.018) (0.017) (0.016) | ||
Poor -0.377 -0.369 -0.381 -0.214 | ||
(0.035) (0.040) (0.040) (0.036) | ||
Intercept 7.692 7.719 7.721 8.831 | ||
(0.053) (0.061) (0.061) (0.067) | ||
Number of observations 80680 80680 80680 80680 | ||
Notably, even the two-way fixed-effects estimates differ considerably from the four-way fixed-effects results. These differences can be illustrated more clearly with a coefficient plot:
. ssc install coefplot, replace . coefplot Twoway Fourway, drop(_cons) xline(0) msymbol(D) mfcolor(white) msize(tiny) legend(order(2 "Two-way" 4 "Four-way"))
Many of the estimated coefficients vary considerably across models. This indicates that unobserved heterogeneity at the occupation and industry levels, beyond individual and time effects, can still introduce significant bias if not properly accounted for.
But how much time do these models take to fit? On my machine, the two-way fixed-effects model ran in 0.3 seconds, and the full four-way specification ran in just 0.7 seconds. A key advantage of using the absorb() option for fixed effects is the substantial gain in computational efficiency. Instead of directly including thousands of indicator variables—in this case over 19,000 individual effects, 10 year effects, 630 occupation effects, and 200 industry effects—the absorb() option processes them internally.
Final thoughts
With the enhanced absorb() option, fitting HDFE models has never been easier. You get cleaner output, faster estimation, and the flexibility to control for multiple categorical variables all with one option for xtreg, areg, or ivregress 2sls.
References
Correia, S. 2016. A feasible estimator for linear models with multi-way fixed effects. Unpublished manuscript, Duke University, https://scorreia.com/research/hdfe.pdf.
IPUMS CPS, University of Minnesota, www.ipums.org.
— Chris Cheng
Senior Econometrician
— Bingsheng Zhang
Senior Mathematician and Statistician