Home  /  Stata News  /  Vol 40 No 4  /  In the spotlight: So many fixed effects, so little time
The Stata News

«Back to main page

In the spotlight: So many fixed effects, so little time

Do you work with panel data or cross-sectional data and need to control for high-dimensional categorical variables? Stata 19 introduced powerful enhancements to its regression capabilities that make it easier and much faster to fit models with high-dimensional fixed effects (HDFE).

The absorb() option is now available in xtreg, fe and ivregress 2sls, letting you efficiently absorb multiple categorical variables at the same time. This greatly expands what’s possible with fixed-effects modeling. In areg, absorb() has also been enhanced to handle multiple categorical variables. And because absorbed effects are omitted from the output, you avoid results you don’t need while still getting fast and efficient estimation.

An example using individual-level panel data

Let’s walk through a concrete example using individual-level panel data from the IPUMS Current Population Survey (CPS), available at https://cps.ipums.org/cps/. The dataset contains repeated observations of individuals over time, with information on each individual's occupation and industry at each time period.

We begin by loading and describing the dataset.

. use ipums
(IPUMS CPS extract, 2015 through 2024)

. describe

Contains data from ipums.dta
 Observations:       170,434                  IPUMS CPS extract, 2015 through 2024
    Variables:            21                  4 Sep 2025 12:57

Variable Storage Display Value
name type format label Variable label
id float %9.0g Individual identification
year int %8.0g Survey year
cpsid double %12.0g Household record
asecflag byte %8.0g ASECFLAG Annual Social and Economic Supplement
region byte %8.0g REGION Region and division
statecensus byte %8.0g STATECENSUS
State census code
cpsidp double %12.0g Person record
age byte %8.0g Age
sex byte %8.0g SEX Sex
race int %8.0g RACE Race
marst byte %23.0g MARST Marital status
vetstat byte %8.0g VETSTAT Veteran status
empstat byte %30.0g EMPSTAT Employment status
occ int %8.0g Occupation
ind int %8.0g Industry
uhrsworkt int %8.0g Hours usually worked per week at all jobs
educ int %8.0g EDUC Educational attainment recode
wksunem2 byte %8.0g Weeks unemployed last year, intervaled
incwage long %12.0g Wage and salary income
health byte %9.0g HEALTH Health status
lnwage float %9.0g Natural log of wage and salary income
Sorted by: id year

We aim to model the log of wages (lnwage) as a function of age, employment status, marital status, hours worked, weeks unemployed, and health condition. We also want to control for the individual, time period, occupation, and industry.

Step 1: Start simple—no fixed effects

First, we establish a baseline by fitting a basic ordinary least-squares regression without any fixed effects. This model ignores any unobserved characteristics that are constant across individuals, years, occupations, and industries.

. regress lnwage c.age##c.age uhrsworkt wksunem2 i.marst i.empstat i.health
(Output omitted)

We use estimates store to store our estimation results so that we can use them later.

. estimates store OLS

Step 2: One-way fixed effects with xtreg, fe

Next, we improve our model by including individual fixed effects using xtreg, fe. These individual fixed effects control for all time-invariant and unobserved factors, such as innate ability or motivation. As a prerequisite, we first declare the panel structure, an essential step before using xt commands like xtreg.

. xtset id year


Panel variable: id (unbalanced)
 Time variable: year, 2015 to 2024, but with gaps
         Delta: 1 unit

. xtreg lnwage c.age##c.age uhrsworkt wksunem2 i.marst i.empstat i.health, fe nolog

Fixed-effects (within) regression               Number of obs     =     80,680
Group variable: id                              Number of groups  =     19,021

R-squared:                                      Obs per group:
     Within  = 0.3634                                         min =          1
     Between = 0.3598                                         avg =        4.2
     Overall = 0.3623                                         max =         10

                                                F(20, 61639)      =    1759.30
corr(u_i, Xb) = -0.0069                         Prob > F          =     0.0000

lnwage Coefficient Std. err. t P>|t| [95% conf. interval]
age .1157152 .0016753 69.07 0.000 .1124316 .1189989
c.age#c.age -.0011639 .0000182 -63.88 0.000 -.0011996 -.0011282
uhrsworkt -.0000866 .0000179 -4.83 0.000 -.0001216 -.0000515
wksunem2 .1025737 .0012164 84.33 0.000 .1001895 .1049578
marst
Married, spouse absent -.19975 .031977 -6.25 0.000 -.2624251 -.137075
Separated -.3952554 .0276942 -14.27 0.000 -.4495361 -.3409746
Divorced -.2076364 .0131721 -15.76 0.000 -.2334538 -.181819
Widowed -.3332016 .0287973 -11.57 0.000 -.3896444 -.2767588
Never married/single -.2313562 .0104761 -22.08 0.000 -.2518894 -.2108231
empstat
At work -.3255529 .0495945 -6.56 0.000 -.4227582 -.2283477
Has job, not at work last week -.3600129 .0541405 -6.65 0.000 -.4661284 -.2538974
Unemployed, experienced worker -.6734508 .0517662 -13.01 0.000 -.7749126 -.571989
Unemployed, new worker -1.190159 .2537126 -4.69 0.000 -1.687436 -.6928814
NILF, unable to work -.9811457 .074068 -13.25 0.000 -1.126319 -.8359722
NILF, other -1.125918 .0505666 -22.27 0.000 -1.225029 -1.026807
NILF, retired -.6796395 .0570466 -11.91 0.000 -.7914509 -.5678281
health
Very good -.0600062 .0092826 -6.46 0.000 -.0782002 -.0418122
Good -.1977527 .010219 -19.35 0.000 -.2177819 -.1777235
Fair -.3452917 .0176644 -19.55 0.000 -.3799139 -.3106695
Poor -.3686544 .0400975 -9.19 0.000 -.4472456 -.2900632
_cons 7.719378 .0613174 125.89 0.000 7.599196 7.83956
sigma_u .52868441
sigma_e .93476059
rho .24235753 (fraction of variance due to u_i)
F test that all u_i=0: F(19020, 61639) = 1.01 Prob > F = 0.2222

. estimates store Oneway

As shown by the joint F test, we do not find evidence that we need to control for individual fixed effects.

Step 3: Two-way fixed effects with absorb()

We extend the model by adding year fixed effects through the absorb() option, capturing time-specific unobservables, such as macroeconomic conditions and policy changes. This yields a two-way fixed-effects specification.

. xtreg lnwage c.age##c.age uhrsworkt wksunem2 i.marst i.empstat i.health, fe absorb(year) nolog
Alternating projection maximum absolute difference = 8.707e-09 

Fixed-effects (within) regression               Number of obs     =     80,680
Group variable: id                              Number of groups  =     19,021

R-squared:                                      Obs per group:
     Within  = 0.3761                                         min =          1
     Between = 0.3595                                         avg =        4.2
     Overall = 0.3622                                         max =         10

                                                F(20, 61630)      =    1792.10
corr(u_i, Xb) = -0.0077                         Prob > F          =     0.0000

Absorbed variable Levels
year 10
lnwage Coefficient Std. err. t P>|t| [95% conf. interval]
age .1155189 .0016591 69.63 0.000 .1122671 .1187707
c.age#c.age -.0011632 .000018 -64.47 0.000 -.0011985 -.0011278
uhrsworkt -.00008 .0000177 -4.51 0.000 -.0001148 -.0000453
wksunem2 .1025185 .0012062 85.00 0.000 .1001545 .1048826
marst
Married, spouse absent -.2096018 .0316645 -6.62 0.000 -.2716643 -.1475393
Separated -.3875132 .027422 -14.13 0.000 -.4412603 -.333766
Divorced -.2020962 .0130424 -15.50 0.000 -.2276593 -.1765331
Widowed -.3300978 .0285125 -11.58 0.000 -.3859824 -.2742133
Never married/single -.2390184 .0103748 -23.04 0.000 -.2593532 -.2186837
empstat
At work -.3119542 .0491056 -6.35 0.000 -.4082012 -.2157072
Has job, not at work last week -.3477729 .0536078 -6.49 0.000 -.4528442 -.2427016
Unemployed, experienced worker -.6544863 .0512603 -12.77 0.000 -.7549566 -.5540161
Unemployed, new worker -1.141233 .2512077 -4.54 0.000 -1.6336 -.6488649
NILF, unable to work -.9626481 .073338 -13.13 0.000 -1.106391 -.8189054
NILF, other -1.114173 .0500685 -22.25 0.000 -1.212307 -1.016039
NILF, retired -.6796395 .0570466 -11.91 0.000 -.7914509 -.5678281
health
Very good -.0698841 .0091966 -7.60 0.000 -.0879094 -.0518587
Good -.2106014 .010126 -20.80 0.000 -.2304485 -.1907544
Fair -.3599402 .0174966 -20.57 0.000 -.3942335 -.3256469
Poor -.3807171 .0397028 -9.59 0.000 -.4585347 -.3028994
_cons 7.721496 .0607112 127.18 0.000 7.602502 7.840491
sigma_u .52355994
sigma_e .92547327
rho .24244753 (fraction of variance due to u_i)
F test that all u_i=0: F(19020, 61630) = 1.10 Prob > F = 0.0000

. estimates store Twoway

Here individual-level fixed effects are captured automatically by the fe option (because xtset declared id as the panel dimension), while the absorb(year) option efficiently controls for all time effects. After including year fixed effects, the coefficients on the variables of interest do not change much, but we now have evidence that we should control for the individual fixed effects based on the joint F test at the bottom of the output.

Step 4: Go full HDFE

We also want to control for occupation and industry effects, so we add these variables in the absorb() option. If we were to add them as regressors, we would have almost 900 parameters to estimate. We do not care about the parameter estimates, but we need to control for the effects of, for instance, technological shifts in a particular industry or the unique skill demands of a specific occupation, which would bias our estimates if they were omitted.

. xtreg lnwage c.age##c.age uhrsworkt wksunem2 i.marst i.empstat i.health, fe 
     absorb(year occ ind) nolog
Alternating projection maximum absolute difference = 3.233e-09 

Fixed-effects (within) regression               Number of obs     =     80,680
Group variable: id                              Number of groups  =     19,021

R-squared:                                      Obs per group:
     Within  = 0.5089                                         min =          1
     Between = 0.3542                                         avg =        4.2
     Overall = 0.3564                                         max =         10

                                                F(20, 60721)      =     901.00
corr(u_i, Xb) = -0.0038                         Prob > F          =     0.0000


Absorbed variable Levels
year 10
occ 631
ind 280
lnwage Coefficient Std. err. t P>|t| [95% conf. interval]
age .0868007 .0015374 56.46 0.000 .0837873 .0898141
c.age#c.age -.0008721 .0000167 -52.36 0.000 -.0009047 -.0008394
uhrsworkt -.0000613 .0000161 -3.82 0.000 -.0000928 -.0000298
wksunem2 .0861902 .0011058 77.94 0.000 .0840229 .0883576
marst
Married, spouse absent -.0656064 .028564 -2.30 0.022 -.1215919 -.0096209
Separated -.1792339 .0247726 -7.24 0.000 -.2277884 -.1306795
Divorced -.0986945 .0118113 -8.36 0.000 -.1218447 -.0755443
Widowed -.1634434 .0257243 -6.35 0.000 -.2138631 -.1130237
Never married/single -.108238 .0094786 -11.42 0.000 -.1268161 -.0896598
empstat
At work -.8150478 .0594023 -13.72 0.000 -.9314766 -.6986191
Has job, not at work last week -.8254109 .0624995 -13.21 0.000 -.94791 -.7029117
Unemployed, experienced worker -1.079656 .0608426 -17.75 0.000 -1.198907 -.9604038
Unemployed, new worker -1.555114 .2248653 -6.92 0.000 -1.99585 -1.114377
NILF, unable to work -1.200687 .0659515 -18.21 0.000 -1.329952 -1.071422
NILF, other -1.373513 .0451674 -30.41 0.000 -1.462042 -1.284985
NILF, retired -.8826988 .0509092 -17.34 0.000 -.982481 -.7829167
health
Very good -.025719 .008297 -3.10 0.002 -.0419811 -.0094569
Good -.0958429 .0092073 -10.41 0.000 -.1138892 -.0777967
Fair -.1874372 .0158488 -11.83 0.000 -.2185008 -.1563736
Poor -.2139877 .0357415 -5.99 0.000 -.2840412 -.1439343
_cons 8.831137 .0669693 131.87 0.000 8.699876 8.962397
sigma_u .47258634
sigma_e .82718913
rho .2460807 (fraction of variance due to u_i)
F test that all u_i=0: F(19020, 60721) = 2.23 Prob > F = 0.0000

. estimates store Fourway

This model now controls for fixed effects across four dimensions: individual, year, occupation, and industry.

Compare the results

We can compare the results across these models to see how our estimates change with additional fixed effects. We begin by creating a table to compare the coefficients.

. etable, estimates(OLS Oneway Twoway Fourway) column(estimates)

OLS Oneway Twoway Fourway
Age 0.115 0.116 0.116 0.087
(0.001) (0.002) (0.002) (0.002)
Age # Age -0.001 -0.001 -0.001 -0.001
(0.000) (0.000) (0.000) (0.000)
Hours usually worked per week at all jobs -0.000 -0.000 -0.000 -0.000
(0.000) (0.000) (0.000) (0.000)
Weeks unemployed last year, intervaled 0.102 0.103 0.103 0.086
(0.001) (0.001) (0.001) (0.001)
Marital status
Married, spouse absent -0.168 -0.200 -0.210 -0.066
(0.028) (0.032) (0.032) (0.029)
Separated -0.413 -0.395 -0.388 -0.179
(0.024) (0.028) (0.027) (0.025)
Divorced -0.202 -0.208 -0.202 -0.099
(0.012) (0.013) (0.013) (0.012)
Widowed -0.317 -0.333 -0.330 -0.163
(0.025) (0.029) (0.029) (0.026)
Never married/single -0.222 -0.231 -0.239 -0.108
(0.009) (0.010) (0.010) (0.009)
Employment status
At work -0.297 -0.326 -0.312 -0.815
(0.043) (0.050) (0.049) (0.059)
Has job, not at work last week -0.361 -0.360 -0.348 -0.825
(0.047) (0.054) (0.054) (0.062)
Unemployed, experienced worker -0.670 -0.673 -0.654 -1.080
(0.045) (0.052) (0.051) (0.061)
Unemployed, new worker -1.494 -1.190 -1.141 -1.555
(0.219) (0.254) (0.251) (0.225)
NILF, unable to work -0.935 -0.981 -0.963 -1.201
(0.065) (0.074) (0.073) (0.066)
NILF, other -1.086 -1.126 -1.114 -1.374
(0.044) (0.051) (0.050) (0.045)
NILF, retired -0.626 -0.680 -0.662 -0.883
(0.050) (0.057) (0.056) (0.051)
Health status
Very good -0.059 -0.060 -0.070 -0.026
(0.008) (0.009) (0.009) (0.008)
Good -0.190 -0.198 -0.211 -0.096
(0.009) (0.010) (0.010) (0.009)
Fair -0.354 -0.345 -0.360 -0.187
(0.015) (0.018) (0.017) (0.016)
Poor -0.377 -0.369 -0.381 -0.214
(0.035) (0.040) (0.040) (0.036)
Intercept 7.692 7.719 7.721 8.831
(0.053) (0.061) (0.061) (0.067)
Number of observations 80680 80680 80680 80680

Notably, even the two-way fixed-effects estimates differ considerably from the four-way fixed-effects results. These differences can be illustrated more clearly with a coefficient plot:

. ssc install coefplot, replace

. coefplot Twoway Fourway, drop(_cons) xline(0) msymbol(D) mfcolor(white) msize(tiny) 
     legend(order(2 "Two-way" 4 "Four-way"))
comp.svg

Many of the estimated coefficients vary considerably across models. This indicates that unobserved heterogeneity at the occupation and industry levels, beyond individual and time effects, can still introduce significant bias if not properly accounted for.

But how much time do these models take to fit? On my machine, the two-way fixed-effects model ran in 0.3 seconds, and the full four-way specification ran in just 0.7 seconds. A key advantage of using the absorb() option for fixed effects is the substantial gain in computational efficiency. Instead of directly including thousands of indicator variables—in this case over 19,000 individual effects, 10 year effects, 630 occupation effects, and 200 industry effects—the absorb() option processes them internally.

Final thoughts

With the enhanced absorb() option, fitting HDFE models has never been easier. You get cleaner output, faster estimation, and the flexibility to control for multiple categorical variables all with one option for xtreg, areg, or ivregress 2sls.

References

Correia, S. 2016. A feasible estimator for linear models with multi-way fixed effects. Unpublished manuscript, Duke University, https://scorreia.com/research/hdfe.pdf.

IPUMS CPS, University of Minnesota, www.ipums.org.

— Chris Cheng
Senior Econometrician

— Bingsheng Zhang
Senior Mathematician and Statistician

«Back to main page