Home  /  Resources & support  /  FAQs  /  Combine multiple tables obtained with table or dtable using collect

Combine multiple tables obtained with table or dtable using collect

 Title Combine multiple tables obtained with table or dtable using collect Author Mia Lv, StataCorp Chris Cheng, StataCorp

Introduction

You may already be familiar with the new table command, but you are wondering how to combine multiple different tables obtained with table into one table (by appending them vertically or horizontally). For example, you may want to combine

. sysuse auto
(1978 automobile data)

. table (rep78), statistic(mean price) statistic(sd price)

Mean   Standard deviation

Repair record 1978
1                     4564.5             522.5519
2                   5967.625             3579.357
3                   6429.233              3525.14
4                     6071.5             1709.608
5                       5913             2615.763
Total               6146.043              2912.44



and

. table (foreign), statistic(mean price) statistic(sd price)

Mean   Standard deviation

Car origin
Domestic    6072.423             3097.104
Foreign     6384.682             2621.915
Total       6165.257             2949.496



into one table:



Mean   Standard deviation

Repair record 1978
1                     4564.5             522.5519
2                   5967.625             3579.357
3                   6429.233              3525.14
4                     6071.5             1709.608
5                       5913             2615.763
Total               6146.043              2912.44
Car origin
Domestic            6072.423             3097.104
Foreign             6384.682             2621.915
Total               6165.257             2949.496



An incorrect way

Before we see the solution, I want to first show you a (commonly used) wrong way to combine two tables. Many users thought we could just use table with two variables in the row dimension because the final table has two row variables:

. table (foreign rep78), statistic(mean price) statistic(sd price)

However, this syntax will generate a different table using crossed categories between foreign and rep78 instead of including them independently. This syntax is not wrong; however, it just does not fit our purpose of appending two tables together.



Mean   Standard deviation

Car origin
Domestic
Repair record 1978
1                     4564.5             522.5519
2                   5967.625             3579.357
3                   6607.074             3661.267
4                   5881.556             1592.019
5                     4204.5             311.8341
Total                6179.25             3188.969
Foreign
Repair record 1978
3                   4828.667             1285.613
4                   6261.444             1896.092
5                   6292.667             2765.629
Total               6070.143             2220.984
Total
Repair record 1978
1                     4564.5             522.5519
2                   5967.625             3579.357
3                   6429.233              3525.14
4                     6071.5             1709.608
5                       5913             2615.763
Total               6146.043              2912.44



we can see that the rows are identified by two variables jointly while rep78 is nested within foreign.

Example 1a: Combine two tables vertically

Let’s continue with the example at the beginning of this FAQ, which appends two tables vertically. The appropriate way to generate the combined table is to specify the append option with each table command except the first one and then use collect layout to include all the categorical variables as the row tags.

Here is the code to generate the combined table that we are looking for:

. sysuse auto, clear
(1978 automobile data)

. collect clear

. table (rep78), statistic(mean price) statistic(sd price)

. table (foreign), statistic(mean price) statistic(sd price) append

. collect layout (rep78 foreign) (result)


The output table looks like



Mean   Standard deviation

Repair record 1978
1                     4564.5             522.5519
2                   5967.625             3579.357
3                   6429.233              3525.14
4                     6071.5             1709.608
5                       5913             2615.763
Total               6146.043              2912.44
Car origin
Domestic            6072.423             3097.104
Foreign             6384.682             2621.915
Total               6165.257             2949.496



Let us explain how this code works.

. collect clear

deletes anything that we previously saved in the collection.

. table (rep78), statistic(mean price) statistic(sd price)

creates the first part of the combined table in a collection called Table.

. table (foreign), statistic(mean price) statistic(sd price) append

creates the second part of the combined table also in the collection Table (default for the command table). If append was not specified here, it would clear all the elements saved previously in this collection. But if append is specified, it will not clear those elements. Instead, it will append any new elements to the existing collection Table.

After the second table command, the collect layout specification shows only the second table by default. We can type collect layout to double-check the current setting.

. collect layout

Collection: Table
Rows: foreign
Columns: result
Table 1: 4 x 2

Mean   Standard deviation

Car origin
Domestic    6072.423             3097.104
Foreign     6384.682             2621.915
Total       6165.257             2949.496



We see the rows only include the levels of foreign. Now let’s add rep78 to the rows using the following syntax:

. collect layout (rep78 foreign) (result)

This above syntax allows rep78 (all the levels) and foreign (all the levels) to be used to identify the table rows. And we do not need to change other specifications such as columns or the collection name.

On the other hand, if we want to include only the result of certain levels of any categorical variable instead of all the levels in the combined table, we can do that by including the levels we want in the row specification. Here is an example:

. collect layout (rep78[2 3 4] foreign[1]) (result)

Collection: Table
Rows: rep78[2 3 4] foreign[1]
Columns: result
Table 1: 6 x 2

Mean   Standard deviation

Repair record 1978
2                   5967.625             3579.357
3                   6429.233              3525.14
4                     6071.5             1709.608
Car origin
Foreign             6384.682             2621.915



If you want to hide the variable label Car origin in the table row header for the variable foreign, you can use the following command:

. collect style header foreign, title(hide)

And here is the output table:

. collect layout

Collection: Table
Rows: rep78[2 3 4] foreign[1]
Columns: result
Table 1: 5 x 2

Mean   Standard deviation

Repair record 1978
2                   5967.625             3579.357
3                   6429.233              3525.14
4                     6071.5             1709.608
Foreign               6384.682             2621.915



Example 1b: Combine multiple tables vertically using a loop

This method also works if you are appending more than two tables into one. Here is an example:

. webuse nhanes2l, clear
(Second National Health and Nutrition Examination Survey)

. collect clear

. local varlist highlead highbp agegrp hlthstat heartatk diabetes

. foreach var of local varlist {
table (var'), statistic(frequency) statistic(percent) name(Table) append totals(var')
}

. collect layout (varlist') (result)

Frequency   Percent

High blood pressure
0                        5,975     57.72
1                        4,376     42.28
Age group
20–29                    2,320     22.41
30–39                    1,622     15.67
40–49                    1,272     12.29
50–59                    1,291     12.47
60–69                    2,860     27.63
70+                        986      9.53
Health status
Excellent                2,407     23.29
Very good                2,591     25.07
Good                     2,938     28.43
Fair                     1,670     16.16
Poor                       729      7.05
Prior heart attack
No heart attack          9,873     95.40
Diabetes status
Not diabetic             9,850     95.18
Diabetic                   499      4.82



You may notice that in this example, append is specified with every table command, even the first. This is because we are using a for loop, and we make every table command have the same options. Although append is not necessary with the first table command, it will not affect anything because we start from an empty collection (we called collect clear before).

By the way, this example aims to show you how to combine multiple tables into one using a loop. However, this type of descriptive table may be created much more easily using one dtable command. For example,

. webuse nhanes2l, clear
(Second National Health and Nutrition Examination Survey)

. dtable, factor(highlead highbp agegrp hlthstat heartatk diabetes)

Summary

N                          10,351
High blood pressure
0                 5,975 (57.7%)
1                 4,376 (42.3%)
Age group
20–29             2,320 (22.4%)
30–39             1,622 (15.7%)
40–49             1,272 (12.3%)
50–59             1,291 (12.5%)
60–69             2,860 (27.6%)
70+                  986 (9.5%)
Health status
Excellent         2,407 (23.3%)
Very good         2,591 (25.1%)
Good              2,938 (28.4%)
Fair              1,670 (16.2%)
Poor                 729 (7.1%)
Prior heart attack
No heart attack   9,873 (95.4%)
Diabetes status
Not diabetic      9,850 (95.2%)
Diabetic             499 (4.8%)



Please note that dtable does not align each statistic like table does. Instead, it bundles all the statistics in the same cell and aligns the combined statistics as a whole. That is because dtable often deals with a mixture of different statistics to accommodate different variable types. However, if you want to change this style, you can further customize the table using the collect suite of commands after you call dtable. In this case, you can use the command

. collect layout (var) (result[fvfrequency fvpercent])

Example 2a: Combine two-way tables horizontally

Let’s say you want to combine



Birthweight<2500g
0          1

Race
White          73         23
Black          15         11
Other          42         25



and



Smoked during pregnancy
Nonsmoker      Smoker

Race
White               44          52
Black               16          10
Other               55          12



into the following table:



Birthweight<2500g     Smoked during pregnancy
0          1       Nonsmoker      Smoker

Race
White          73         23              44          52
Black          15         11              16          10
Other          15         11              16          10



It contains two side-by-side tables, which are combined horizontally. How can we do that?

You can use the similar method we adopted in example 1a. In the collect layout command, instead of specifying two row tags, you should specify two column tags. Here is the code to generate the above (combined) table:

. clear all

. webuse lbw
(Hosmer & Lemeshow data)

. table race low, nototal
. table race smoke, nototal append
. collect layout (race) (low smoke)


As an alternative, you can also use dtable and collect to generate the same table:

. clear all

. webuse lbw
(Hosmer & Lemeshow data)

. dtable,
by(race, nototals)
nosample
factor(low smoke, statistics( fvfrequency))
style(table)
. collect layout (race#result) (var)


dtable generates a descriptive table that is the transpose of our target table. Then we need to transpose that descriptive table by switching the order of row tags and column tags in collect layout to generate the table we want. For more detailed information about how to transpose a table, please see FAQ: How do you transpose a table generated with collect/table/dtable/etable?

By the way, dtable has a different default style than table. Here I specified the option style(table) with dtable to force it to have the same style as table so that the generated table will look exactly the same as the above combined table generated with table and collect.

Example 2b: Combine three one-way tables horizontally

Except for the above situations where we want to combine different tables with a joint row variable (horizontally), sometimes we want to combine tables with different row variables. For example, we want to combine these three one-way frequency tables:

. table var1

Frequency

var1
1                7
2                7
3                5
Total           19


. table var2

Frequency

var2
1                8
2                5
3                2
Total           15


. table var3

Frequency

var3
1                2
2                5
3                6
Total           13



into



Frequency
var1   var2   var3

1           7      8      2
2           7      5      5
3           5      2      6
Total      19     15     13



It makes sense to combine them because var1var3 all have the same set of unique values (1,2,3).

When we have more than one command to collect items into a collection (we need to specify the append option to avoid clearing up the previously saved items), there is a dimension called cmdset tracking different items saved by different commands. We can list all of its levels by

. collect levelsof cmdset

Now we are thinking about how to utilize that dimension to define the columns in the final table. Our first try is

. clear all
. set obs 20
. set seed 7011
. generate var1 = runiformint(1,3) if uniform()<0.8
. generate var2 = runiformint(1,3) if uniform()<0.8
. generate var3 = runiformint(1,3) if uniform()<0.8
. table var1,
. table var2, append
. table var3, append
. collect layout (var1 var2 var3) (cmdset)


However, the output table looks like

Collection: Table
Rows: var1 var2 var3
Columns: cmdset
Table 1: 15 x 3

Command results index
1        2        3

var1
1              7
2              7
3              5
Total         19
var2
1                       8
2                       5
3                       2
Total                  15
var3
1                                2
2                                5
3                                6
Total                           13



Now we find that the table has nine rows instead of three rows as expected. This is because Stata sees var1[1], var2[1], and var3[1] as three different tags. Let’s fix that using collect remap and adjust the format as well.

. collect remap var2 = var1
. collect remap var3 = var1
. collect style header var1, title(hide)
. collect style header cmdset, title(hide)
. collect label levels cmdset 1 "var1", modify
. collect label levels cmdset 2 "var2", modify
. collect label levels cmdset 3 "var3", modify
. collect layout (var1) (cmdset)


After collect remap, the dimensions var2 and var3 all become var1, so we can use var1 alone to define the rows for the table. The output table becomes



var1   var2   var3

1           7      8      2
2           7      5      5
3           5      2      6
Total      19     15     13



That’s the exact table we want to generate.

Example 2c: Combine two two-way tables vertically

The remap strategy in example 2b applies to circumstances where we want to combine tables with different row or column variables and those variables have the same set of values.

Now let’s go back to example 2a. What if we would like to put the two two-way tables



Birthweight<2500g
0          1

Race
White          73         23
Black          15         11
Other          42         25



and



Smoked during pregnancy
Nonsmoker      Smoker

Race
White               44          52
Black               16          10
Other               55          12



atop one another instead of side-by-side, given both low and smoke are binary? We can use the following code:

. clear all

. webuse lbw
(Hosmer & Lemeshow data)

. table race low, nototal
. table race smoke, nototal append
. collect remap smoke = low
. collect layout (cmdset#race) (low)
. collect style header low cmdset, title(hide)
. collect label levels cmdset 1 Birthweight<2500g 2 Smoked during pregnancy
. collect layout


This table has race nested within cmdset on rows, and low defines the columns (race is remapped to low).

The output table looks like



0    1

Birthweight<2500g
Race
White                  73   23
Black                  15   11
Other                  42   25
Smoked during pregnancy
Race
White                  44   52
Black                  16   10
Other                  55   12



Example 3a: Combine multiple tables generated by dtable using collect combine

In the following example, we generate three descriptive tables for the same variable in three different subpopulation groups using dtable.

. dtable i.foreign mpg if rep78==3

Summary

N                         30
Car origin
Domestic        27 (90.0%)
Foreign          3 (10.0%)
Mileage (mpg) 19.433 (4.141)


. dtable i.foreign mpg if rep78==4

Summary

N                         18
Car origin
Domestic         9 (50.0%)
Foreign          9 (50.0%)
Mileage (mpg) 21.667 (4.935)


. dtable i.foreign mpg if rep78==5

Summary

N                         11
Car origin
Domestic         2 (18.2%)
Foreign          9 (81.8%)
Mileage (mpg) 27.364 (8.732)



We want to combine them into one big table. However, we cannot use append because this option is not available with dtable. Instead, we can save them in different collections first (by specifying the option name() with dtable to overwrite the default collection name DTable) and then combine them using collect combine. When we run collect combine, Stata will create a dimension called collection in the combined collection to mark which collection each item is originally from. Then we can use the dimension collection to define the rows or columns in our table.

Here is the code:

. clear all

. sysuse auto
(1978 automobile data)

. dtable i.foreign mpg if rep78==3, name(a1)
. dtable i.foreign mpg if rep78==4, name(a2)
. dtable i.foreign mpg if rep78==5, name(a3)
. collect combine all = a1 a2 a3
. *change the display format for means and sds
. collect style cell result[mean sd], nformat(%8.2fc)
. collect layout (collection#var) (result)


Now we can specify the layout by having var nested in the three collections on rows, while result is put on the columns. In this way, we obtain a long table.



Summary

a1
N                       30
Car origin
Domestic      27 (90.0%)
Foreign        3 (10.0%)
Mileage (mpg) 19.43 (4.14)
a2
N                       18
Car origin
Domestic       9 (50.0%)
Foreign        9 (50.0%)
Mileage (mpg) 21.67 (4.93)
a3
N                       11
Car origin
Domestic       2 (18.2%)
Foreign        9 (81.8%)
Mileage (mpg) 27.36 (8.73)



Another possible layout is to have result nested within collection on the columns. Then we combine the three tables horizontally, and we obtain a wide table.

. collect layout (var) (collection#result)

Collection: all
Rows: var
Columns: collection#result
Table 1: 5 x 3

a1           a2           a3
Summary      Summary      Summary

N                       30           18           11
Car origin
Domestic      27 (90.0%)    9 (50.0%)    2 (18.2%)
Foreign        3 (10.0%)    9 (50.0%)    9 (81.8%)
Mileage (mpg) 19.43 (4.14) 21.67 (4.93) 27.36 (8.73)



Example 3b: Combine tables generated by dtable and table

Occasionally, we are going to need to combine tables generated by dtable and table. For example, we want to generate a descriptive table for a set of variables, including continuous variables and categorical variables. More specifically, we need to obtain means and sds for continuous variables and obtain frequencies and percentages by rural for all the categorical variables. In this case, dtable can take care of the statistics for continuous variables. However, dtable cannot obtain factor percentages by another variable, which can be done by table. In this situation, we need to run dtable for continuous variables and table for categorical variables and then combine two tables together. Here is the complete code:

. clear all

. webuse nhanes2l
(1978 automobile data)

. * continuous variable part
. dtable height weight bpsystol albumin vitaminc zinc copper, by(rural, nototal)
. * factor variable part
. local vlist highlead highbp agegrp hlthstat heartatk diabetes
. foreach var of local vlist {
table (var') (rural), statistic(frequency) statistic(percent, across(rural)) name(DTable) append nototal
}
. collect layout (var highlead highbp agegrp hlthstat heartatk diabetes) (rural#result[_dtable_stats])


The output table looks like



Rural
Urban            Rural

N                          6,548 (63.3%)    3,803 (36.7%)
Height (cm)              167.457 (9.761)  167.984 (9.465)
Weight (kg)              71.427 (15.363)  72.708 (15.314)
Systolic blood pressure 130.460 (23.526) 131.607 (22.980)
Serum albumin (g/dL)       4.665 (0.333)    4.677 (0.328)
Serum vitamin C (mg/dL)    1.066 (0.592)    0.981 (0.558)
Serum zinc (mcg/dL)      86.399 (14.506)  86.691 (14.432)
Serum copper (mcg/dL)   126.054 (33.126) 124.860 (31.469)
High blood pressure
0                        3,798 (63.6%)    2,177 (36.4%)
1                        2,750 (62.8%)    1,626 (37.2%)
Age group
20–29                    1,631 (70.3%)      689 (29.7%)
30–39                      981 (60.5%)      641 (39.5%)
40–49                      777 (61.1%)      495 (38.9%)
50–59                      810 (62.7%)      481 (37.3%)
60–69                    1,743 (60.9%)    1,117 (39.1%)
70+                        606 (61.5%)      380 (38.5%)
Health status
Excellent                1,609 (66.8%)      798 (33.2%)
Very good                1,713 (66.1%)      878 (33.9%)
Good                     1,878 (63.9%)    1,060 (36.1%)
Fair                       950 (56.9%)      720 (43.1%)
Poor                       389 (53.4%)      340 (46.6%)
Prior heart attack
No heart attack          6,272 (63.5%)    3,601 (36.5%)
Had heart attack           275 (57.8%)      201 (42.2%)
Diabetes status
Not diabetic             6,233 (63.3%)    3,617 (36.7%)
Diabetic                   314 (62.9%)      185 (37.1%)



In the above code, we have specified the option append with each table command to append the statistics we compute using table to the same collection that was created by the dtable command before (we must specify the collection name using name(DTable); otherwise, it will use the default name Table). If the append option is not specified, it will first clear the collection and then save its results.

You may wonder why we use the result tag result[_dtable_stats] in the final layout. This is because _dtable_stats is a composite result, which includes frequency, percent, mean, and sd, which are all the statistics involved in this table. On each row, these four statistics will not be all available together, but any available ones will be displayed. This composite result is generated by the first dtable command. You can check all the elements of this composite result by typing

. collect query composite _dtable_stats

Composite definition
Collection: DTable
Composite: _dtable_stats
Elements: frequency
percent
mean
sd
Delimiter: " "
Trim: on
Override: off


Read more about tables in the Customizable Tables and Collected Results Reference Manual.