FAQ: How to add empty categories for the row or column variables to tables generated with dtable or table?

Home / Resources & support / FAQs / How to add empty factor-variable levels to a table?

How to add empty categories for the row or column variables to tables generated with dtable or table?

Title		How to add empty factor-variable levels to a table?
Authors		Mia Lv, StataCorp Jeff Pitblado, StataCorp

When describing a subgroup of your data with the dtable or table command in Stata, you may notice that some categories of your categorical variables are missing from the output if they do not appear in your sample—even though those categories exist in the full dataset. But you would like to include all the possible categories (including those with zero frequency) in the table so that your table dimension remains consistent across different samples.

In this FAQ, I will demonstrate how to manually add those categories using the collect suite of commands through four practical examples. It is worth mentioning that there is a separate FAQ discussing the general case when you want to add a custom item to an existing table. It is good to read that FAQ first because it will help you become familiar with the basics of table dimensions and using collect to modify table dimensions in Stata.

Example 1. Add rows to a table generated with dtable

Below is a table generated with the dtable command:



                               Car origin           
                    Domestic    Foreign     Total   

N                  22 (68.8%) 10 (31.2%) 32 (100.0%)
Repair record 1978                                  
  2                 4 (18.2%)   0 (0.0%)   4 (12.9%)
  3                12 (54.5%)  1 (11.1%)  13 (41.9%)
  4                 6 (27.3%)  3 (33.3%)   9 (29.0%)
  5                  0 (0.0%)  5 (55.6%)   5 (16.1%)

And below is that same table after we add a new row:



                               Car origin           
                    Domestic    Foreign     Total   

N                  22 (68.8%) 10 (31.2%) 32 (100.0%)
Repair record 1978                                  
  1                  0 (0.0%)   0 (0.0%)    0 (0.0%)
  2                 4 (18.2%)   0 (0.0%)   4 (12.9%)
  3                12 (54.5%)  1 (11.1%)  13 (41.9%)
  4                 6 (27.3%)  3 (33.3%)   9 (29.0%)
  5                  0 (0.0%)  5 (55.6%)   5 (16.1%)

Here is the code to create the table, including the code that adds the new row:

sysuse auto, clear

dtable if price>5000 & trunk>10, factor(rep78) by(foreign)

* Get the list of displayed rep78 categories
collect levelsof rep78
local found `"`s(levels)'"'

* Get the full list of all the rep78 categories
levelsof rep78, local(all)

* Obtain the list for the missing categories of rep78 by comparing these two lists
local not_found: list all - found
display "Empty levels for rep78 include: `not_found'"


collect levelsof foreign
local column_levels `"`s(levels)'"'

foreach x of local not_found {
	foreach k of local column_levels{
* Add zero fvfrequency and zero fvpercent for each missing category of rep78 and each category of foreign
		collect get fvfrequency=0 fvpercent=0, tags(var[`x'.rep78] foreign[`k'])
	}
}

collect style autolevels result _dtable_stats, clear

collect layout

This code creates a descriptive table of repair records (rep78) by car origin (foreign) for cars with price>5000 and trunk>10. In this code, we first store the rep78 categories found in the table and compare them with all possible rep78 values in the dataset to identify missing categories (1). For each missing repair record category, we add zero fvfrequency and zero fvfrequency for all the foreign categories (0,1) using collect get within a foreach loop.

Please note that fvfrequency represents the count (frequency) of observations for a given factor-variable level, while fvpercent indicates the percentage of observations in that level. For more details, see [R] dtable, which provides formal definitions and additional information about these statistics.

Example 2. Add columns to a table generated with dtable

Below is a table generated with the dtable command:



                                         Race                    
                         White           Black          Total    

N                       142 (57.7%)    104 (42.3%)   246 (100.0%)
Married                                                          
  Single                 34 (23.9%)     49 (47.1%)     83 (33.7%)
  Married               108 (76.1%)     55 (52.9%)    163 (66.3%)
College graduate                                                 
  Not college grad      103 (72.5%)     85 (81.7%)    188 (76.4%)
  College grad           39 (27.5%)     19 (18.3%)     58 (23.6%)
Age in current year  35.324 (0.669) 35.327 (0.630) 35.325 (0.651)
Hourly wage           7.522 (5.793)  5.970 (4.843)  6.866 (5.456)
Usual hours worked  38.408 (10.512) 38.126 (8.235) 38.290 (9.603)

And below is that same table after we add a new column:



                                             Race                         
                         White           Black       Other       Total    

N                       142 (57.7%)    104 (42.3%)            246 (100.0%)
Married                                                                   
  Single                 34 (23.9%)     49 (47.1%) 0 (0.0%)     83 (33.7%)
  Married               108 (76.1%)     55 (52.9%) 0 (0.0%)    163 (66.3%)
College graduate                                                          
  Not college grad      103 (72.5%)     85 (81.7%) 0 (0.0%)    188 (76.4%)
  College grad           39 (27.5%)     19 (18.3%) 0 (0.0%)     58 (23.6%)
Age in current year  35.324 (0.669) 35.327 (0.630)    . (.) 35.325 (0.651)
Hourly wage           7.522 (5.793)  5.970 (4.843)    . (.)  6.866 (5.456)
Usual hours worked  38.408 (10.512) 38.126 (8.235)    . (.) 38.290 (9.603)

Here is the code to create the table, including the code that adds the new column:

sysuse nlsw88, clear

dtable if south==1 & age<37, by(race) factor(married collgrad) continuous(age wage hours)

collect levelsof race
local found `"`s(levels)'"'

levelsof race, local(all)

local not_found: list all - found

display "Missing categories for race include: `not_found'"


collect levelsof var
local rows `"`s(levels)'"'

* Separate continuous and categorical variables excluding _N and _hide
local continuous_vars ""
local categorical_vars ""
foreach v of local rows {
    * Skip _N and _hide
    if "`v'" == "_N" | "`v'" == "_hide" {
        continue
    }
    * Check if this is a categorical variable (contains a dot)
    else if strpos("`v'", ".") > 0 {
        local categorical_vars `categorical_vars' `v'
    }
    else {
        local continuous_vars `continuous_vars' `v'
    }
}

display "Continuous variables: `continuous_vars'"
display "Categorical variables: `categorical_vars'"


foreach x of local not_found {
    * Add missing mean and sd for continuous variables
    foreach k of local continuous_vars {
        collect get mean=. sd=., tags(race[`x'] var[`k'])
    }

    * Add zero fvfrequency and fvpercent for categorical variables
    foreach k of local categorical_vars {
        collect get fvfrequency=0 fvpercent=0, tags(race[`x'] var[`k'])
    }
}

* Update the auto levels for race
collect levelsof race
collect style autolevels race `s(levels)', clear
collect style autolevels result _dtable_stats, clear

collect layout

This code creates a descriptive table for women in the nlsw88 dataset who live in the South and are under 37, broken down by race. It includes both categorical variables (married and collgrad) and continuous variables (age, wage, and hours).

In this code, we first identify any missing race categories in the filtered sample by comparing race values in the table against all possible race values in the dataset. We discover that race=3 is missing from the table. We then separate the levels in the collect dimension var (for rows) into two groups: continuous and categorical by checking for dots in their level names. Levels for categorical variables contain a dot (for example, "0.married"), while continuous variables do not. Special var levels like _N and _hide are explicitly excluded from this process. For each missing race category, we add missing values (.) for means and standard deviations of continuous variables, and zeros for frequencies and percentages of categorical variables.

Examples 3 and 4 follow a similar method.

Example 3. Add rows to a table generated with table

Below is a table generated with the table command:



                      Frequency
   
Repair record 1978             
  3                           1
  4                           6
  5                           6
  Total                      13

And below is that same table after we add two new rows:



                      Frequency
   
Repair record 1978             
  1                           0
  2                           0
  3                           1
  4                           6
  5                           6
  Total                      13

Here is the code to create the table, including the code that adds two new rows:

sysuse auto, clear

table (rep78) if price > 5000 & foreign == 1

collect levelsof rep78
local found `"`s(levels)'"'

levelsof rep78, local(all)

local not_found: list all - found
display "Missing categories: `not_found'"

foreach x of local not_found {
    collect get frequency=0, tags(rep78[`x'])
}

collect layout

Example 4. Add columns to a table generated with table

Below is a table generated with the table command:



                                   Race        
                          White   Black   Total
   
College graduate        
  Not college grad      
    Mean                   6.92    4.95    6.03
    Standard deviation     6.35    2.20    5.01
  College grad          
    Mean                   9.11   10.53    9.57
    Standard deviation     3.58    9.21    5.98
  Total                 
    Mean                   7.52    5.97    6.87
    Standard deviation     5.79    4.84    5.46

And below is that same table after we add a new column:



                Race            
   White   Black   Other   Total
   
College graduate        
  Not college grad      
    Mean                   6.92    4.95       .    6.03
    Standard deviation     6.35    2.20       .    5.01
  College grad          
    Mean                   9.11   10.53       .    9.57
    Standard deviation     3.58    9.21       .    5.98
  Total                 
    Mean                   7.52    5.97       .    6.87
    Standard deviation     5.79    4.84       .    5.46

Here is the code to create the table, including the code that adds the new column:

sysuse nlsw88, clear

table (collgrad) (race) if south==1 & age<37, statistic(mean wage) statistic(sd wage) nformat(%6.2f mean sd)

collect levelsof race
local found `"`s(levels)'"'

levelsof race, local(all)

local not_found: list all - found

display "Missing categories for race include: `not_found'"

collect levelsof collgrad
local rows `"`s(levels)'"'


foreach x of local not_found{
	foreach k of local rows {
        collect get mean=. sd=., tags(race[`x'] collgrad[`k'])
    }
}

collect layout

How to add empty categories for the row or column variables to tables generated with dtable or table?

Example 1. Add rows to a table generated with dtable

Example 2. Add columns to a table generated with dtable

Example 3. Add rows to a table generated with table

Example 4. Add columns to a table generated with table

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies


		Race
		White Black Total

College graduate
Not college grad
Mean		6.92 4.95 6.03
Standard deviation		6.35 2.20 5.01
College grad
Mean		9.11 10.53 9.57
Standard deviation		3.58 9.21 5.98
Total
Mean		7.52 5.97 6.87
Standard deviation		5.79 4.84 5.46


		Race
		White Black Other Total

College graduate
Not college grad
Mean		6.92 4.95 . 6.03
Standard deviation		6.35 2.20 . 5.01
College grad
Mean		9.11 10.53 . 9.57
Standard deviation		3.58 9.21 . 5.98
Total
Mean		7.52 5.97 . 6.87
Standard deviation		5.79 4.84 . 5.46

Stata/MP4 Annual License (download)

How to add empty categories for the row or column variables to tables generated with dtable or table?

Example 1. Add rows to a table generated with dtable

Example 2. Add columns to a table generated with dtable

Example 3. Add rows to a table generated with table

Example 4. Add columns to a table generated with table

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies