Home  /  Resources & support  /  FAQs  /  How to add empty factor-variable levels to a table?

How to add empty categories for the row or column variables to tables generated with dtable or table?

Title   How to add empty factor-variable levels to a table?
Authors Mia Lv, StataCorp
Jeff Pitblado, StataCorp

When describing a subgroup of your data with the dtable or table command in Stata, you may notice that some categories of your categorical variables are missing from the output if they do not appear in your sample—even though those categories exist in the full dataset. But you would like to include all the possible categories (including those with zero frequency) in the table so that your table dimension remains consistent across different samples.

In this FAQ, I will demonstrate how to manually add those categories using the collect suite of commands through four practical examples. It is worth mentioning that there is a separate FAQ discussing the general case when you want to add a custom item to an existing table. It is good to read that FAQ first because it will help you become familiar with the basics of table dimensions and using collect to modify table dimensions in Stata.

Example 1. Add rows to a table generated with dtable

Below is a table generated with the dtable command:

Car origin
Domestic Foreign Total
N 22 (68.8%) 10 (31.2%) 32 (100.0%)
Repair record 1978
2 4 (18.2%) 0 (0.0%) 4 (12.9%)
3 12 (54.5%) 1 (11.1%) 13 (41.9%)
4 6 (27.3%) 3 (33.3%) 9 (29.0%)
5 0 (0.0%) 5 (55.6%) 5 (16.1%)

And below is that same table after we add a new row:

Car origin
Domestic Foreign Total
N 22 (68.8%) 10 (31.2%) 32 (100.0%)
Repair record 1978
1 0 (0.0%) 0 (0.0%) 0 (0.0%)
2 4 (18.2%) 0 (0.0%) 4 (12.9%)
3 12 (54.5%) 1 (11.1%) 13 (41.9%)
4 6 (27.3%) 3 (33.3%) 9 (29.0%)
5 0 (0.0%) 5 (55.6%) 5 (16.1%)

Here is the code to create the table, including the code that adds the new row:

sysuse auto, clear

dtable if price>5000 & trunk>10, factor(rep78) by(foreign)

* Get the list of displayed rep78 categories
collect levelsof rep78
local found `"`s(levels)'"'

* Get the full list of all the rep78 categories
levelsof rep78, local(all)

* Obtain the list for the missing categories of rep78 by comparing these two lists
local not_found: list all - found
display "Empty levels for rep78 include: `not_found'"


collect levelsof foreign
local column_levels `"`s(levels)'"'

foreach x of local not_found {
	foreach k of local column_levels{
* Add zero fvfrequency and zero fvpercent for each missing category of rep78 and each category of foreign
		collect get fvfrequency=0 fvpercent=0, tags(var[`x'.rep78] foreign[`k'])
	}
}

collect style autolevels result _dtable_stats, clear

collect layout

This code creates a descriptive table of repair records (rep78) by car origin (foreign) for cars with price>5000 and trunk>10. In this code, we first store the rep78 categories found in the table and compare them with all possible rep78 values in the dataset to identify missing categories (1). For each missing repair record category, we add zero fvfrequency and zero fvfrequency for all the foreign categories (0,1) using collect get within a foreach loop.

Please note that fvfrequency represents the count (frequency) of observations for a given factor-variable level, while fvpercent indicates the percentage of observations in that level. For more details, see [R] dtable, which provides formal definitions and additional information about these statistics.

Example 2. Add columns to a table generated with dtable

Below is a table generated with the dtable command:

Race
White Black Total
N 142 (57.7%) 104 (42.3%) 246 (100.0%)
Married
Single 34 (23.9%) 49 (47.1%) 83 (33.7%)
Married 108 (76.1%) 55 (52.9%) 163 (66.3%)
College graduate
Not college grad 103 (72.5%) 85 (81.7%) 188 (76.4%)
College grad 39 (27.5%) 19 (18.3%) 58 (23.6%)
Age in current year 35.324 (0.669) 35.327 (0.630) 35.325 (0.651)
Hourly wage 7.522 (5.793) 5.970 (4.843) 6.866 (5.456)
Usual hours worked 38.408 (10.512) 38.126 (8.235) 38.290 (9.603)

And below is that same table after we add a new column:

Race
White Black Other Total
N 142 (57.7%) 104 (42.3%) 246 (100.0%)
Married
Single 34 (23.9%) 49 (47.1%) 0 (0.0%) 83 (33.7%)
Married 108 (76.1%) 55 (52.9%) 0 (0.0%) 163 (66.3%)
College graduate
Not college grad 103 (72.5%) 85 (81.7%) 0 (0.0%) 188 (76.4%)
College grad 39 (27.5%) 19 (18.3%) 0 (0.0%) 58 (23.6%)
Age in current year 35.324 (0.669) 35.327 (0.630) . (.) 35.325 (0.651)
Hourly wage 7.522 (5.793) 5.970 (4.843) . (.) 6.866 (5.456)
Usual hours worked 38.408 (10.512) 38.126 (8.235) . (.) 38.290 (9.603)

Here is the code to create the table, including the code that adds the new column:

sysuse nlsw88, clear

dtable if south==1 & age<37, by(race) factor(married collgrad) continuous(age wage hours)

collect levelsof race
local found `"`s(levels)'"'

levelsof race, local(all)

local not_found: list all - found

display "Missing categories for race include: `not_found'"


collect levelsof var
local rows `"`s(levels)'"'

* Separate continuous and categorical variables excluding _N and _hide
local continuous_vars ""
local categorical_vars ""
foreach v of local rows {
    * Skip _N and _hide
    if "`v'" == "_N" | "`v'" == "_hide" {
        continue
    }
    * Check if this is a categorical variable (contains a dot)
    else if strpos("`v'", ".") > 0 {
        local categorical_vars `categorical_vars' `v'
    }
    else {
        local continuous_vars `continuous_vars' `v'
    }
}

display "Continuous variables: `continuous_vars'"
display "Categorical variables: `categorical_vars'"


foreach x of local not_found {
    * Add missing mean and sd for continuous variables
    foreach k of local continuous_vars {
        collect get mean=. sd=., tags(race[`x'] var[`k'])
    }

    * Add zero fvfrequency and fvpercent for categorical variables
    foreach k of local categorical_vars {
        collect get fvfrequency=0 fvpercent=0, tags(race[`x'] var[`k'])
    }
}

* Update the auto levels for race
collect levelsof race
collect style autolevels race `s(levels)', clear
collect style autolevels result _dtable_stats, clear

collect layout

This code creates a descriptive table for women in the nlsw88 dataset who live in the South and are under 37, broken down by race. It includes both categorical variables (married and collgrad) and continuous variables (age, wage, and hours).

In this code, we first identify any missing race categories in the filtered sample by comparing race values in the table against all possible race values in the dataset. We discover that race=3 is missing from the table. We then separate the levels in the collect dimension var (for rows) into two groups: continuous and categorical by checking for dots in their level names. Levels for categorical variables contain a dot (for example, "0.married"), while continuous variables do not. Special var levels like _N and _hide are explicitly excluded from this process. For each missing race category, we add missing values (.) for means and standard deviations of continuous variables, and zeros for frequencies and percentages of categorical variables.

Examples 3 and 4 follow a similar method.

Example 3. Add rows to a table generated with table

Below is a table generated with the table command:

Frequency
Repair record 1978
3 1
4 6
5 6
Total 13

And below is that same table after we add two new rows:

Frequency
Repair record 1978
1 0
2 0
3 1
4 6
5 6
Total 13

Here is the code to create the table, including the code that adds two new rows:

sysuse auto, clear

table (rep78) if price > 5000 & foreign == 1

collect levelsof rep78
local found `"`s(levels)'"'

levelsof rep78, local(all)

local not_found: list all - found
display "Missing categories: `not_found'"

foreach x of local not_found {
    collect get frequency=0, tags(rep78[`x'])
}

collect layout

Example 4. Add columns to a table generated with table

Below is a table generated with the table command:

Race
White Black Total
College graduate
Not college grad
Mean 6.92 4.95 6.03
Standard deviation 6.35 2.20 5.01
College grad
Mean 9.11 10.53 9.57
Standard deviation 3.58 9.21 5.98
Total
Mean 7.52 5.97 6.87
Standard deviation 5.79 4.84 5.46

And below is that same table after we add a new column:

Race
White Black Other Total
College graduate
Not college grad
Mean 6.92 4.95 . 6.03
Standard deviation 6.35 2.20 . 5.01
College grad
Mean 9.11 10.53 . 9.57
Standard deviation 3.58 9.21 . 5.98
Total
Mean 7.52 5.97 . 6.87
Standard deviation 5.79 4.84 . 5.46

Here is the code to create the table, including the code that adds the new column:

sysuse nlsw88, clear

table (collgrad) (race) if south==1 & age<37, statistic(mean wage) statistic(sd wage) nformat(%6.2f mean sd)

collect levelsof race
local found `"`s(levels)'"'

levelsof race, local(all)

local not_found: list all - found

display "Missing categories for race include: `not_found'"

collect levelsof collgrad
local rows `"`s(levels)'"'


foreach x of local not_found{
	foreach k of local rows {
        collect get mean=. sd=., tags(race[`x'] collgrad[`k'])
    }
}

collect layout