FAQ: How do I display row percentages for categorical variables in Table 1 generated by dtable?

Home / Resources & support / FAQs / How do I display row percentages for categorical variables in Table 1 generated by dtable?

How do I display row percentages for categorical variables in Table 1 generated by dtable?

Title		Display row percentages for categorical variables in Table 1 generated by dtable
Author		Mia Lv, StataCorp

Note: This FAQ assumes familiarity with the core concepts of Stata’s collect framework. If you are not familiar with them, see [TABLES] Intro 2 for a refresher.

When we use dtable with any categorical variables along with the by() option, it reports counts with column percentages by default. For example,

. sysuse auto
(1978 automobile data)

. dtable mpg i.rep78, by(foreign)



                                    Car origin                 
                      Domestic        Foreign         Total    

N                      52 (70.3%)     22 (29.7%)    74 (100.0%)
Mileage (mpg)      19.827 (4.743) 24.773 (6.611) 21.297 (5.786)
Repair record 1978                                             
  1                      2 (4.2%)       0 (0.0%)       2 (2.9%)
  2                     8 (16.7%)       0 (0.0%)      8 (11.6%)
  3                    27 (56.2%)      3 (14.3%)     30 (43.5%)
  4                     9 (18.8%)      9 (42.9%)     18 (26.1%)
  5                      2 (4.2%)      9 (42.9%)     11 (15.9%)

The reported percentages for different levels of rep78 within each foreign group are column percentages, so they sum to 100% in each column.

However, if our goal is producing a table where the percentages in each rep78 row sum to 100% across the Domestic and Foreign groups (for example, row percentages), what should we do? The desired table would look like



                               Car origin                 
                 Domestic        Foreign         Total    

N                      52 (70.3%)     22 (29.7%)    74 (100.0%)
Mileage (mpg)      19.827 (4.743) 24.773 (6.611) 21.297 (5.786)
Repair record 1978                                             
  1                    2 (100.0%)         0 (0%)     2 (100.0%)
  2                    8 (100.0%)         0 (0%)     8 (100.0%)
  3                    27 (90.0%)      3 (10.0%)    30 (100.0%)
  4                     9 (50.0%)      9 (50.0%)    18 (100.0%)
  5                     2 (18.2%)      9 (81.8%)    11 (100.0%)

Short answer: There is no built-in option with dtable to produce row percentages. However, you can work around this by computing these row percentages with the table command or by rotating the table generated with dtable. With either way, we may need to combine small tables into a big one. This FAQ provides several examples for common scenarios.

Example 1: Use the table command for categorical variables, and combine those tables with dtable
Example 2: Categorical variables only: use the table command and combine the small tables
Example 3a: Categorical variables only: use dtable first, then transpose the table to get row percentages
Example 3b: Same as 3a, but with the total column and test results

Example 1: Use the table command for categorical variables, and combine those tables with dtable

In this case, we want to generate a descriptive table for a set of variables, including both continuous variables and categorical variables. The specific code and output are not listed in this FAQ because they are part of another FAQ. Please refer to FAQ: Combine multiple tables obtained with table or dtable using collect, example 3b for details.

Example 2: Categorical variables only: use the table command and combine the small tables

This example is very similar to example 1, except that this table does not include continuous variables. Therefore, dtable is no longer needed, and the composite result result[_dtable_stats] (generated by dtable) that we were using to define the final layout in example 1 does not exist. We need to create our own composite results. Furthermore, we want our table to look like tables generated by dtable, so we want to apply the style of dtable.

The example code is as follows:

collect clear

webuse nhanes2l, clear

* create an empty cell in order to show the label in the final table
replace highlead = 0 if rural == 1

* build the overall sample table
table () (rural), stat(frequency) stat(percent, across(rural)) name(N) replace
collect addtags N[_N]
collect layout (N) (rural#result)

 * factor-variable part
local vlist highlead highbp agegrp hlthstat heartatk diabetes

foreach var of local vlist {
	* if you do not want total column, replace totals(`var') with nototal
        table (`var') (rural), statistic(frequency) ///
	  statistic(percent, across(rural)) totals(`var') name(`var')
   }

* create a new collection named "all" that combines the above collections
collect combine all = N `vlist'

* define a composite result as -dtable- does
collect composite define stats = frequency percent, trim

* use the style of dtable
collect style use dtable, override

* show "0 (0%)" for the empty cells
collect style cell, empty(0 (0%))

 * further adjust the header style
collect style header result N, level(hide)
collect style header, title(label)

* update the table layout to include the sample statistics and the categorical variables
* you do not need the total column, use rural[0 1] instead of rural
collect layout (N `vlist') (rural#result[stats])

Here is the resulting table:



                                            Rural                     
                           Urban           Rural            Total     

N                      6,548 (63.3%)   3,803 (36.7%)   10,351 (100.0%)
High lead level                                                       
  lead<25              2,902 (43.3%)   3,803 (56.7%)    6,705 (100.0%)
  lead>=25              209 (100.0%)          0 (0%)      209 (100.0%)
High blood pressure                                                   
  0                    3,798 (63.6%)   2,177 (36.4%)    5,975 (100.0%)
  1                    2,750 (62.8%)   1,626 (37.2%)    4,376 (100.0%)
Age group                                                             
  20–29                1,631 (70.3%)     689 (29.7%)    2,320 (100.0%)
  30–39                  981 (60.5%)     641 (39.5%)    1,622 (100.0%)
  40–49                  777 (61.1%)     495 (38.9%)    1,272 (100.0%)
  50–59                  810 (62.7%)     481 (37.3%)    1,291 (100.0%)
  60–69                1,743 (60.9%)   1,117 (39.1%)    2,860 (100.0%)
  70+                    606 (61.5%)     380 (38.5%)      986 (100.0%)
Health status                                                         
  Excellent            1,609 (66.8%)     798 (33.2%)    2,407 (100.0%)
  Very good            1,713 (66.1%)     878 (33.9%)    2,591 (100.0%)
  Good                 1,878 (63.9%)   1,060 (36.1%)    2,938 (100.0%)
  Fair                   950 (56.9%)     720 (43.1%)    1,670 (100.0%)
  Poor                   389 (53.4%)     340 (46.6%)      729 (100.0%)
Prior heart attack                                                    
  No heart attack      6,272 (63.5%)   3,601 (36.5%)    9,873 (100.0%)
  Had heart attack       275 (57.8%)     201 (42.2%)      476 (100.0%)
Diabetes status                                                       
  Not diabetic         6,233 (63.3%)   3,617 (36.7%)    9,850 (100.0%)
  Diabetic               314 (62.9%)     185 (37.1%)      499 (100.0%)

In this example, we first generate different collections in the loop—one collection for each categorical variable—and then we combine all the collections into one using collect combine. This is an alternative to using table with its append option to combine multiple table commands’ results into a single collection. The advantage of this approach is that it will create a new dimension called collection. This dimension can be helpful in defining the final table layout in some cases. Please refer to example 3a in the table-combining FAQ for more information. However, in this example, the new dimension collection is not used. So we can use table, append to achieve the same goal. You may replace

foreach var of local vlist { 	
      * if you do not want total column, replace totals(`var') with nototal
     table (`var') (rural), statistic(frequency) ///
       statistic(percent, across(rural)) totals(`var') name(`var') 
  }

* create a new collection named "all" that combines the above collections
collect combine all = N `vlist'

with

foreach var of local vlist { 	
      * if you do not want total column, replace totals(`var') with nototal
     table (`var') (rural), statistic(frequency) ///
       statistic(percent, across(rural)) totals(`var') name(N) append
  }

If you want to apply the style properties saved in a predefined style file, style-dtable.stjson, to the current collection to match the style of dtable, use the command

collect style use dtable, override

The detailed style properties are given in the manual entry [TABLES] Predefined styles.

Example 3a: Categorical variables only: use dtable first, then transpose the table to get row percentages

We know dtable can generate column percentages. If we invert the row variable and column variable in the syntax of dtable and then transpose the table, it will have the row percentages that we are looking for (see this FAQ for a detailed instruction on how to transpose a table). Because dtable allows only one by() variable for the columns, we need to run dtable multiple times for multiple variables and then combine the table, as we did similarly with example 2 above.

Peease note that this method will not work if the ultimate “column” variable does not meet the requirements for a factor variable; for example, it must have only nonnegative integer values and cannot be a string variable.

Below is an example illustrating this method:

collect clear
webuse nhanes2l, clear

* create an empty cell in order to show the label in the final table
replace highlead = 0 if rural == 1

local colvar "rural"
local vlist highlead highbp agegrp hlthstat heartatk diabetes

foreach var of local vlist {
	dtable i.`colvar', by(`var', nototal) name(`var')
  }

collect combine c = `vlist', replace

collect style header `colvar', title(label)

collect layout (`vlist') (`colvar'#result[_dtable_stats])

The generated table looks like



                               Rural           
                        Urban         Rural    

High lead level                                
  lead<25           2,902 (43.3%) 3,803 (56.7%)
  lead>=25           209 (100.0%)      0 (0.0%)
High blood pressure                            
  0                 3,798 (63.6%) 2,177 (36.4%)
  1                 2,750 (62.8%) 1,626 (37.2%)
Age group                                      
  20–29             1,631 (70.3%)   689 (29.7%)
  30–39               981 (60.5%)   641 (39.5%)
  40–49               777 (61.1%)   495 (38.9%)
  50–59               810 (62.7%)   481 (37.3%)
  60–69             1,743 (60.9%) 1,117 (39.1%)
  70+                 606 (61.5%)   380 (38.5%)
Health status                                  
  Excellent         1,609 (66.8%)   798 (33.2%)
  Very good         1,713 (66.1%)   878 (33.9%)
  Good              1,878 (63.9%) 1,060 (36.1%)
  Fair                950 (56.9%)   720 (43.1%)
  Poor                389 (53.4%)   340 (46.6%)
Prior heart attack                             
  No heart attack   6,272 (63.5%) 3,601 (36.5%)
  Had heart attack    275 (57.8%)   201 (42.2%)
Diabetes status                                
  Not diabetic      6,233 (63.3%) 3,617 (36.7%)
  Diabetic            314 (62.9%)   185 (37.1%)

Example 3b: Same as 3a, but with the total column and test results

If you want to include the total column, use the method described in Example 2 above. If test results are also desired, you can run the command to perform the specific tests and manually add the p-values to the table, following the guidance in FAQ: How do I insert a customized statistic in an existing table? Alternatively, you can collect the total frequencies and test results from the dtable commands and edit their tags to match the transposed table’s layout. Here we show you how:

collect clear
webuse nhanes2l, clear

* create an empty cell in order to show the label in the final table
replace highlead = 0 if rural == 1
local colvar "rural"
local vlist "highlead highbp agegrp hlthstat heartatk diabetes"
qui summ `colvar'
local colmin=r(min)

foreach var of local vlist {
 	dtable i.`colvar', by(`var',test nototal) name(`var')
	*  add tag `colvar'[.m] to the total frequency
	collect addtags `colvar'[.m], fortags(var[_N]#result[frequency])
	* manually add a total percentage of 100 to each row
	levelsof `var', local(levels)
	foreach level of local levels{
	      collect get percent=100, tags(`colvar'[.m] `var'[`level'])
	}
	* add tag `var'[`min'] to the test result so that it will be associated with the minimum level of `var'
	qui summ `var'
	local min=r(min)
	collect remap `var'[_dtable_test] = `var'[`min']
  }

collect combine c = `vlist', replace
collect remap `colvar'[`colmin'] = `colvar'[test], fortags(result[pearson])
collect style header `colvar', title(label)
collect label levels rural .m "Total"
levelsof `colvar', local(levels)
collect style autolevels `colvar' `levels' .m test, clear
collect style autolevels result _dtable_stats _dtable_test, clear
collect layout (`vlist') (`colvar'#result)

The resulting table looks like



                                          Rural                      
                        Urban         Rural          Total      test 

High lead level                                                      
  lead<25           2,902 (43.3%) 3,803 (56.7%) 6,705 (100.0%) <0.001
  lead>=25           209 (100.0%)      0 (0.0%)   209 (100.0%)       
High blood pressure                                                  
  0                 3,798 (63.6%) 2,177 (36.4%) 5,975 (100.0%)  0.452
  1                 2,750 (62.8%) 1,626 (37.2%) 4,376 (100.0%)       
Age group                                                            
  20–29             1,631 (70.3%)   689 (29.7%) 2,320 (100.0%) <0.001
  30–39               981 (60.5%)   641 (39.5%) 1,622 (100.0%)       
  40–49               777 (61.1%)   495 (38.9%) 1,272 (100.0%)       
  50–59               810 (62.7%)   481 (37.3%) 1,291 (100.0%)       
  60–69             1,743 (60.9%) 1,117 (39.1%) 2,860 (100.0%)       
  70+                 606 (61.5%)   380 (38.5%)   986 (100.0%)       
Health status                                                        
  Excellent         1,609 (66.8%)   798 (33.2%) 2,407 (100.0%) <0.001
  Very good         1,713 (66.1%)   878 (33.9%) 2,591 (100.0%)       
  Good              1,878 (63.9%) 1,060 (36.1%) 2,938 (100.0%)       
  Fair                950 (56.9%)   720 (43.1%) 1,670 (100.0%)       
  Poor                389 (53.4%)   340 (46.6%)   729 (100.0%)       
Prior heart attack                                                   
  No heart attack   6,272 (63.5%) 3,601 (36.5%) 9,873 (100.0%)  0.011
  Had heart attack    275 (57.8%)   201 (42.2%)   476 (100.0%)       
Diabetes status                                                      
  Not diabetic      6,233 (63.3%) 3,617 (36.7%) 9,850 (100.0%)  0.873
  Diabetic            314 (62.9%)   185 (37.1%)   499 (100.0%)

Read more about tables in the Customizable Tables and Collected Results Reference Manual.

How do I display row percentages for categorical variables in Table 1 generated by dtable?

Example 1: Use the table command for categorical variables, and combine those tables with dtable

Example 2: Categorical variables only: use the table command and combine the small tables

Example 3a: Categorical variables only: use dtable first, then transpose the table to get row percentages

Example 3b: Same as 3a, but with the total column and test results

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies


		Rural
		Urban Rural Total

N		6,548 (63.3%) 3,803 (36.7%) 10,351 (100.0%)
High lead level
lead<25		2,902 (43.3%) 3,803 (56.7%) 6,705 (100.0%)
lead>=25		209 (100.0%) 0 (0%) 209 (100.0%)
High blood pressure
0		3,798 (63.6%) 2,177 (36.4%) 5,975 (100.0%)
1		2,750 (62.8%) 1,626 (37.2%) 4,376 (100.0%)
Age group
20–29		1,631 (70.3%) 689 (29.7%) 2,320 (100.0%)
30–39		981 (60.5%) 641 (39.5%) 1,622 (100.0%)
40–49		777 (61.1%) 495 (38.9%) 1,272 (100.0%)
50–59		810 (62.7%) 481 (37.3%) 1,291 (100.0%)
60–69		1,743 (60.9%) 1,117 (39.1%) 2,860 (100.0%)
70+		606 (61.5%) 380 (38.5%) 986 (100.0%)
Health status
Excellent		1,609 (66.8%) 798 (33.2%) 2,407 (100.0%)
Very good		1,713 (66.1%) 878 (33.9%) 2,591 (100.0%)
Good		1,878 (63.9%) 1,060 (36.1%) 2,938 (100.0%)
Fair		950 (56.9%) 720 (43.1%) 1,670 (100.0%)
Poor		389 (53.4%) 340 (46.6%) 729 (100.0%)
Prior heart attack
No heart attack		6,272 (63.5%) 3,601 (36.5%) 9,873 (100.0%)
Had heart attack		275 (57.8%) 201 (42.2%) 476 (100.0%)
Diabetes status
Not diabetic		6,233 (63.3%) 3,617 (36.7%) 9,850 (100.0%)
Diabetic		314 (62.9%) 185 (37.1%) 499 (100.0%)

Stata/MP4 Annual License (download)

How do I display row percentages for categorical variables in Table 1 generated by dtable?

Example 1: Use the table command for categorical variables, and combine those tables with dtable

Example 2: Categorical variables only: use the table command and combine the small tables

Example 3a: Categorical variables only: use dtable first, then transpose the table to get row percentages

Example 3b: Same as 3a, but with the total column and test results

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies