| Title | Display row percentages for categorical variables in Table 1 generated by dtable | |
| Author | Mia Lv, StataCorp |
When we use dtable with any categorical variables along with the by() option, it reports counts with column percentages by default. For example,
. sysuse auto (1978 automobile data) . dtable mpg i.rep78, by(foreign)
| Car origin |
| Domestic Foreign Total |
| N 52 (70.3%) 22 (29.7%) 74 (100.0%) Mileage (mpg) 19.827 (4.743) 24.773 (6.611) 21.297 (5.786) Repair record 1978 1 2 (4.2%) 0 (0.0%) 2 (2.9%) 2 8 (16.7%) 0 (0.0%) 8 (11.6%) 3 27 (56.2%) 3 (14.3%) 30 (43.5%) 4 9 (18.8%) 9 (42.9%) 18 (26.1%) 5 2 (4.2%) 9 (42.9%) 11 (15.9%) |
The reported percentages for different levels of rep78 within each foreign group are column percentages, so they sum to 100% in each column.
However, if our goal is producing a table where the percentages in each rep78 row sum to 100% across the Domestic and Foreign groups (for example, row percentages), what should we do? The desired table would look like
| Car origin |
| Domestic Foreign Total |
| N 52 (70.3%) 22 (29.7%) 74 (100.0%) Mileage (mpg) 19.827 (4.743) 24.773 (6.611) 21.297 (5.786) Repair record 1978 1 2 (100.0%) 0 (0%) 2 (100.0%) 2 8 (100.0%) 0 (0%) 8 (100.0%) 3 27 (90.0%) 3 (10.0%) 30 (100.0%) 4 9 (50.0%) 9 (50.0%) 18 (100.0%) 5 2 (18.2%) 9 (81.8%) 11 (100.0%) |
Short answer: There is no built-in option with dtable to produce row percentages. However, you can work around this by computing these row percentages with the table command or by rotating the table generated with dtable. With either way, we may need to combine small tables into a big one. This FAQ provides several examples for common scenarios.
In this case, we want to generate a descriptive table for a set of variables, including both continuous variables and categorical variables. The specific code and output are not listed in this FAQ because they are part of another FAQ. Please refer to FAQ: Combine multiple tables obtained with table or dtable using collect, example 3b for details.
This example is very similar to example 1, except that this table does not include continuous variables. Therefore, dtable is no longer needed, and the composite result result[_dtable_stats] (generated by dtable) that we were using to define the final layout in example 1 does not exist. We need to create our own composite results. Furthermore, we want our table to look like tables generated by dtable, so we want to apply the style of dtable.
The example code is as follows:
collect clear
webuse nhanes2l, clear
* create an empty cell in order to show the label in the final table
replace highlead = 0 if rural == 1
* build the overall sample table
table () (rural), stat(frequency) stat(percent, across(rural)) name(N) replace
collect addtags N[_N]
collect layout (N) (rural#result)
* factor-variable part
local vlist highlead highbp agegrp hlthstat heartatk diabetes
foreach var of local vlist {
* if you do not want total column, replace totals(`var') with nototal
table (`var') (rural), statistic(frequency) ///
statistic(percent, across(rural)) totals(`var') name(`var')
}
* create a new collection named "all" that combines the above collections
collect combine all = N `vlist'
* define a composite result as -dtable- does
collect composite define stats = frequency percent, trim
* use the style of dtable
collect style use dtable, override
* show "0 (0%)" for the empty cells
collect style cell, empty(0 (0%))
* further adjust the header style
collect style header result N, level(hide)
collect style header, title(label)
* update the table layout to include the sample statistics and the categorical variables
* you do not need the total column, use rural[0 1] instead of rural
collect layout (N `vlist') (rural#result[stats])
Here is the resulting table:
| Rural | ||
| Urban Rural Total | ||
| N | 6,548 (63.3%) 3,803 (36.7%) 10,351 (100.0%) | |
| High lead level | ||
| lead<25 | 2,902 (43.3%) 3,803 (56.7%) 6,705 (100.0%) | |
| lead>=25 | 209 (100.0%) 0 (0%) 209 (100.0%) | |
| High blood pressure | ||
| 0 | 3,798 (63.6%) 2,177 (36.4%) 5,975 (100.0%) | |
| 1 | 2,750 (62.8%) 1,626 (37.2%) 4,376 (100.0%) | |
| Age group | ||
| 20–29 | 1,631 (70.3%) 689 (29.7%) 2,320 (100.0%) | |
| 30–39 | 981 (60.5%) 641 (39.5%) 1,622 (100.0%) | |
| 40–49 | 777 (61.1%) 495 (38.9%) 1,272 (100.0%) | |
| 50–59 | 810 (62.7%) 481 (37.3%) 1,291 (100.0%) | |
| 60–69 | 1,743 (60.9%) 1,117 (39.1%) 2,860 (100.0%) | |
| 70+ | 606 (61.5%) 380 (38.5%) 986 (100.0%) | |
| Health status | ||
| Excellent | 1,609 (66.8%) 798 (33.2%) 2,407 (100.0%) | |
| Very good | 1,713 (66.1%) 878 (33.9%) 2,591 (100.0%) | |
| Good | 1,878 (63.9%) 1,060 (36.1%) 2,938 (100.0%) | |
| Fair | 950 (56.9%) 720 (43.1%) 1,670 (100.0%) | |
| Poor | 389 (53.4%) 340 (46.6%) 729 (100.0%) | |
| Prior heart attack | ||
| No heart attack | 6,272 (63.5%) 3,601 (36.5%) 9,873 (100.0%) | |
| Had heart attack | 275 (57.8%) 201 (42.2%) 476 (100.0%) | |
| Diabetes status | ||
| Not diabetic | 6,233 (63.3%) 3,617 (36.7%) 9,850 (100.0%) | |
| Diabetic | 314 (62.9%) 185 (37.1%) 499 (100.0%) | |
In this example, we first generate different collections in the loop—one collection for each categorical variable—and then we combine all the collections into one using collect combine. This is an alternative to using table with its append option to combine multiple table commands’ results into a single collection. The advantage of this approach is that it will create a new dimension called collection. This dimension can be helpful in defining the final table layout in some cases. Please refer to example 3a in the table-combining FAQ for more information. However, in this example, the new dimension collection is not used. So we can use table, append to achieve the same goal. You may replace
foreach var of local vlist {
* if you do not want total column, replace totals(`var') with nototal
table (`var') (rural), statistic(frequency) ///
statistic(percent, across(rural)) totals(`var') name(`var')
}
* create a new collection named "all" that combines the above collections
collect combine all = N `vlist'
with
foreach var of local vlist {
* if you do not want total column, replace totals(`var') with nototal
table (`var') (rural), statistic(frequency) ///
statistic(percent, across(rural)) totals(`var') name(N) append
}
If you want to apply the style properties saved in a predefined style file, style-dtable.stjson, to the current collection to match the style of dtable, use the command
collect style use dtable, override
The detailed style properties are given in the manual entry [TABLES] Predefined styles.
We know dtable can generate column percentages. If we invert the row variable and column variable in the syntax of dtable and then transpose the table, it will have the row percentages that we are looking for (see this FAQ for a detailed instruction on how to transpose a table). Because dtable allows only one by() variable for the columns, we need to run dtable multiple times for multiple variables and then combine the table, as we did similarly with example 2 above.
Peease note that this method will not work if the ultimate “column” variable does not meet the requirements for a factor variable; for example, it must have only nonnegative integer values and cannot be a string variable.
Below is an example illustrating this method:
collect clear
webuse nhanes2l, clear
* create an empty cell in order to show the label in the final table
replace highlead = 0 if rural == 1
local colvar "rural"
local vlist highlead highbp agegrp hlthstat heartatk diabetes
foreach var of local vlist {
dtable i.`colvar', by(`var', nototal) name(`var')
}
collect combine c = `vlist', replace
collect style header `colvar', title(label)
collect layout (`vlist') (`colvar'#result[_dtable_stats])
The generated table looks like
| Rural |
| Urban Rural |
| High lead level lead<25 2,902 (43.3%) 3,803 (56.7%) lead>=25 209 (100.0%) 0 (0.0%) High blood pressure 0 3,798 (63.6%) 2,177 (36.4%) 1 2,750 (62.8%) 1,626 (37.2%) Age group 20–29 1,631 (70.3%) 689 (29.7%) 30–39 981 (60.5%) 641 (39.5%) 40–49 777 (61.1%) 495 (38.9%) 50–59 810 (62.7%) 481 (37.3%) 60–69 1,743 (60.9%) 1,117 (39.1%) 70+ 606 (61.5%) 380 (38.5%) Health status Excellent 1,609 (66.8%) 798 (33.2%) Very good 1,713 (66.1%) 878 (33.9%) Good 1,878 (63.9%) 1,060 (36.1%) Fair 950 (56.9%) 720 (43.1%) Poor 389 (53.4%) 340 (46.6%) Prior heart attack No heart attack 6,272 (63.5%) 3,601 (36.5%) Had heart attack 275 (57.8%) 201 (42.2%) Diabetes status Not diabetic 6,233 (63.3%) 3,617 (36.7%) Diabetic 314 (62.9%) 185 (37.1%) |
If you want to include the total column, use the method described in Example 2 above. If test results are also desired, you can run the command to perform the specific tests and manually add the p-values to the table, following the guidance in FAQ: How do I insert a customized statistic in an existing table? Alternatively, you can collect the total frequencies and test results from the dtable commands and edit their tags to match the transposed table’s layout. Here we show you how:
collect clear
webuse nhanes2l, clear
* create an empty cell in order to show the label in the final table
replace highlead = 0 if rural == 1
local colvar "rural"
local vlist "highlead highbp agegrp hlthstat heartatk diabetes"
qui summ `colvar'
local colmin=r(min)
foreach var of local vlist {
dtable i.`colvar', by(`var',test nototal) name(`var')
* add tag `colvar'[.m] to the total frequency
collect addtags `colvar'[.m], fortags(var[_N]#result[frequency])
* manually add a total percentage of 100 to each row
levelsof `var', local(levels)
foreach level of local levels{
collect get percent=100, tags(`colvar'[.m] `var'[`level'])
}
* add tag `var'[`min'] to the test result so that it will be associated with the minimum level of `var'
qui summ `var'
local min=r(min)
collect remap `var'[_dtable_test] = `var'[`min']
}
collect combine c = `vlist', replace
collect remap `colvar'[`colmin'] = `colvar'[test], fortags(result[pearson])
collect style header `colvar', title(label)
collect label levels rural .m "Total"
levelsof `colvar', local(levels)
collect style autolevels `colvar' `levels' .m test, clear
collect style autolevels result _dtable_stats _dtable_test, clear
collect layout (`vlist') (`colvar'#result)
The resulting table looks like
| Rural |
| Urban Rural Total test |
| High lead level lead<25 2,902 (43.3%) 3,803 (56.7%) 6,705 (100.0%) <0.001 lead>=25 209 (100.0%) 0 (0.0%) 209 (100.0%) High blood pressure 0 3,798 (63.6%) 2,177 (36.4%) 5,975 (100.0%) 0.452 1 2,750 (62.8%) 1,626 (37.2%) 4,376 (100.0%) Age group 20–29 1,631 (70.3%) 689 (29.7%) 2,320 (100.0%) <0.001 30–39 981 (60.5%) 641 (39.5%) 1,622 (100.0%) 40–49 777 (61.1%) 495 (38.9%) 1,272 (100.0%) 50–59 810 (62.7%) 481 (37.3%) 1,291 (100.0%) 60–69 1,743 (60.9%) 1,117 (39.1%) 2,860 (100.0%) 70+ 606 (61.5%) 380 (38.5%) 986 (100.0%) Health status Excellent 1,609 (66.8%) 798 (33.2%) 2,407 (100.0%) <0.001 Very good 1,713 (66.1%) 878 (33.9%) 2,591 (100.0%) Good 1,878 (63.9%) 1,060 (36.1%) 2,938 (100.0%) Fair 950 (56.9%) 720 (43.1%) 1,670 (100.0%) Poor 389 (53.4%) 340 (46.6%) 729 (100.0%) Prior heart attack No heart attack 6,272 (63.5%) 3,601 (36.5%) 9,873 (100.0%) 0.011 Had heart attack 275 (57.8%) 201 (42.2%) 476 (100.0%) Diabetes status Not diabetic 6,233 (63.3%) 3,617 (36.7%) 9,850 (100.0%) 0.873 Diabetic 314 (62.9%) 185 (37.1%) 499 (100.0%) |
Read more about tables in the Customizable Tables and Collected Results Reference Manual.