Home  /  Resources & support  /  FAQs  /  How can I remove rows or columns from a table generated with collect, table, or dtable?

## How can I remove rows or columns from a table generated with collect, table, or dtable?

 Title How can I remove rows or columns from a table generated with collect, table, or dtable? Author Gabriela Ortiz, StataCorp

When creating tables of results, you may find that you want to remove certain rows or columns. To do so, you need to know how to refer to the results that are being collected; the key is to identify the dimensions and the levels for the specific results we want. In this FAQ, we will demonstrate how to modify the rows and columns after creating a table with collect, table, and dtable. The FAQ is organized as follows:

### Modify rows and columns after creating a table with collect

Suppose we want to create a table of estimation results after fitting a linear regression model. First, we collect the estimation results with the collect prefix, and then we lay out our table:

. collect clear

. webuse nhanes2l, clear
(Second National Health and Nutrition Examination Survey)

. collect: regress bpsystol age weight i.sex

Source         SS           df       MS      Number of obs   =    10,351
F(3, 10347)     =   1501.75
Model     1709209.9         3  569736.633   Prob > F        =    0.0000
Residual    3925460.13    10,347  379.381476   R-squared       =    0.3033
Total    5634670.03    10,350  544.412563   Root MSE        =    19.478

bpsystol   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

age     .6374325   .0111334    57.25   0.000     .6156088    .6592562
weight     .4170339    .013474    30.95   0.000     .3906221    .4434456

sex
Female      .8244702   .4140342     1.99   0.046     .0128832    1.636057
_cons     70.13615   1.187299    59.07   0.000     67.80881    72.46348

. collect layout (colname) (result)

Collection: default
Rows: colname
Columns: result
Table 1: 5 x 10

Coefficient       95% CI          df 95% lower bound p-value Std. error 95% upper bound     t   |t| Standardized coefficient

Age (years)       .6374325 .6156088  .6592562 10347        .6156088   0.000   .0111334        .6592562 57.25 57.25                 .4702977
Weight (kg)       .4170339 .3906221  .4434456 10347        .3906221   0.000    .013474        .4434456 30.95 30.95                 .2744714
Male                     0                                                           0                                                    0
Female            .8244702 .0128832  1.636057 10347        .0128832   0.046   .4140342        1.636057  1.99  1.99                 .0176462
Intercept         70.13615 67.80881  72.46348 10347        67.80881   0.000   1.187299        72.46348 59.07 59.07



For models with a single outcome variable, we can lay out our table by specifying dimensions colname and result; in this example, we placed levels of colname on the rows and levels of result on the columns of the table. The name colname refers to the column names of the returned matrix e(b), and it identifies the independent variables in our model. The third row in the table corresponds to Male, the base level for sex; we can specify that base levels not be shown with the following setting:

. collect style showbase off


The dimension result identifies the results returned in e() and r(), such as the coefficients, confidence intervals, and model statistics. We have more results than we’d like in our table. Suppose we only want to include the coefficients and confidence intervals. We can list the levels of dimension result to see how to refer to these results:

. collect levelsof result

Collection: default
Dimension: result
Levels: F N _r_b _r_ci _r_df _r_lb _r_p _r_se _r_ub _r_z _r_z_abs beta cmd
cmdline depvar df_m df_r estat_cmd ll ll_0 marginsok model mss
predict properties r2 r2_a rank rmse rss title vce


It may not be obvious from the result names which ones we want, so instead we list the levels with their labels. It’s good practice to add the all option so that all levels get listed, even those without a label:

. collect label list result, all

Collection: default
Dimension: result
Label: Result
Level labels:
F  F statistic
N  Number of observations
_r_b  Coefficient
_r_ci  __LEVEL__% CI
_r_df  df
_r_lb  __LEVEL__% lower bound
_r_p  p-value
_r_se  Std. error
_r_ub  __LEVEL__% upper bound
_r_z  t
_r_z_abs  |t|
beta  Standardized coefficient
cmd  Command
cmdline  Command line as typed
depvar  Dependent variable

(output omitted) 

We now spot _r_b for coefficients and _r_ci for confidence intervals, and we include only these levels of the dimension result.

. collect layout (colname) (result[_r_b _r_ci])

Collection: default
Rows: colname
Columns: result[_r_b _r_ci]
Table 1: 5 x 2

Coefficient       95% CI

Age (years)       .6374325 .6156088  .6592562
Weight (kg)       .4170339 .3906221  .4434456
Male                     0
Female            .8244702 .0128832  1.636057
Intercept         70.13615 67.80881  72.46348



In fact, if you find yourself specifying the same levels of a dimension each time, you can set the levels to be automatically displayed. For example, below we specify that only the coefficients and confidence intervals be displayed whenever we list the dimension result:

. collect style autolevels result _r_b _r_ci, clear


And now if we simply list the dimension result, we’ll only get those two results:

. collect layout (colname) (result)

Collection: default
Rows: colname
Columns: result
Table 1: 4 x 2

Coefficient       95% CI

Age (years)       .6374325 .6156088  .6592562
Weight (kg)       .4170339 .3906221  .4434456
Female            .8244702 .0128832  1.636057
Intercept         70.13615 67.80881  72.46348



The automatic level specification applies only to this collection, but you can use collect style save to save these style settings and later apply them to another collection. Learn more about automatic levels in FAQ: What are the autolevels of a dimension in a table (collection)? and in the documentation for collect style autolevels.

### Modify rows and columns after creating a table with table

When creating tables with the table command, we are simultaneously specifying the statistics we want displayed and the layout of our table. With table as well, we may find that we want to remove rows or columns. For example, consider this table with the percentage of males and females in each age group who have had a heart attack.

. table (agegrp) (sex heartatk), statistic(percent, across(heartatk)) sformat("%s%%") missing

Sex
Male                                              Female                                                  Total
Prior heart attack                                 Prior heart attack                                     Prior heart attack
No heart attack   Had heart attack     Total   No heart attack   Had heart attack       .     Total   No heart attack   Had heart attack       .     Total

Age group
20–29              100.00%                      100.00%            99.92%              0.08%           100.00%            99.96%              0.04%           100.00%
30–39               99.74%              0.26%   100.00%            99.65%              0.23%   0.12%   100.00%            99.69%              0.25%   0.06%   100.00%
40–49               98.03%              1.97%   100.00%            98.64%              1.21%   0.15%   100.00%            98.35%              1.57%   0.08%   100.00%
50–59               92.36%              7.64%   100.00%            96.66%              3.34%           100.00%            94.66%              5.34%           100.00%
60–69               86.56%             13.44%   100.00%            94.57%              5.43%           100.00%            90.73%              9.27%           100.00%
70+                 83.48%             16.52%   100.00%            92.01%              7.99%           100.00%            88.13%             11.87%           100.00%
Total               93.53%              6.47%   100.00%            97.06%              2.91%   0.04%   100.00%            95.38%              4.60%   0.02%   100.00%



By default, table will report totals for each category. We specified the missing option to include any observations with a missing value and the sformat() option to display a percent sign next to the percentages.

The table seems too wide for our taste, and we now decide to omit the columns for the totals and missing values. Also, suppose that we’re interested only in people 40 and older. First, we’ll check the current layout specification with collect layout, and then we’ll modify the contents of the table.

. collect layout

Collection: Table
Rows: agegrp
Columns: sex#heartatk
Tables: result
Table 1: 8 x 11

(output omitted)


We see that the dimensions for the rows and columns correspond to the variables we specified in the parentheses. The dimension result identifies the requested statistics, which in our case is just percent. You can type collect dims to see what other dimensions we have in this collection.

We can select the levels of heartatk and agegrp that we want to include in our table, but we need to know what level refers to the 40s, 50s, and other age groups. Below, we list the levels of the dimensions and their labels, if there are any.

. collect label list sex, all

Collection: Table
Dimension: sex
Label: Sex
Level labels:
.m  Total
1  Male
2  Female

. collect label list agegrp, all

Collection: Table
Dimension: agegrp
Label: Age group
Level labels:
.m  Total
1  20–29
2  30–39
3  40–49
4  50–59
5  60–69
6  70+

. collect label list heartatk, all

Collection: Table
Dimension: heartatk
Label: Prior heart attack
Level labels:
.
.m  Total
0  No heart attack


The variable labels are used as the dimension labels; for example, the variable label for agegrp is “Age group”. The levels of each dimension are simply the numeric values, and the labels are obtained from the value labels. The one level you may not recognize is .m, which corresponds to the Total category. The level . is the category for missing values.

Now we know how to specify the levels of sex, agegrp, and heartatk that we want to include. Below, we include the age groups from 40 and above, and we omit the total and missing categories:

. collect layout (agegrp[3 4 5 6]) (sex[1 2]#heartatk[0 1]) (result)

Collection: Table
Rows: agegrp[3 4 5 6]
Columns: sex[1 2]#heartatk[0 1]
Tables: result
Table 1: 5 x 4

Sex
Male                                Female
Prior heart attack                   Prior heart attack
No heart attack   Had heart attack   No heart attack   Had heart attack

Age group
40–49               98.03%              1.97%            98.64%              1.21%
50–59               92.36%              7.64%            96.66%              3.34%
60–69               86.56%             13.44%            94.57%              5.43%
70+                 83.48%             16.52%            92.01%              7.99%



Now we have just the categories of interest, but our table header would look better if we hide the dimension labels for sex and heartatk:

. collect style header sex heartatk, title(hide)

. collect preview

Male                                Female
No heart attack   Had heart attack   No heart attack   Had heart attack

Age group
40–49               98.03%              1.97%            98.64%              1.21%
50–59               92.36%              7.64%            96.66%              3.34%
60–69               86.56%             13.44%            94.57%              5.43%
70+                 83.48%             16.52%            92.01%              7.99%



That’s much better.

### Modify rows and columns after creating a table with dtable

When creating tables with the dtable command, we can specify the levels of factor variables that we want to include, but there still may be rows or columns that we wish to omit. For example, below we look at the percentage of individuals who have diabetes and those who have had a heart attack in each region of the United States.

. dtable age weight i.sex i.diabetes i.heartatk, by(region, nototals)

Region
NE              MW              S               W

N                    2,096 (20.2%)   2,774 (26.8%)   2,853 (27.6%)   2,628 (25.4%)
Age (years)        47.816 (17.017) 46.528 (17.376) 48.191 (16.864) 47.838 (17.535)
Weight (kg)        71.646 (14.922) 72.050 (15.340) 72.035 (15.655) 71.787 (15.393)
Sex
Male               1,018 (48.6%)   1,310 (47.2%)   1,332 (46.7%)   1,255 (47.8%)
Female             1,078 (51.4%)   1,464 (52.8%)   1,521 (53.3%)   1,373 (52.2%)
Diabetes status
Not diabetic       1,997 (95.3%)   2,648 (95.5%)   2,692 (94.4%)   2,513 (95.6%)
Diabetic               98 (4.7%)      125 (4.5%)      161 (5.6%)      115 (4.4%)
Prior heart attack
No heart attack    2,018 (96.3%)   2,652 (95.6%)   2,722 (95.4%)   2,481 (94.4%)
Had heart attack       77 (3.7%)      121 (4.4%)      131 (4.6%)      147 (5.6%)



I’m interested only in the percent of individuals with a prior heart attack or diabetes, so I omit the other categories below. Both diabetes and heartatk are binary variables, with a value of 1 indicating that the individual has the condition or disease, so we type

. dtable age weight i.sex 1.diabetes 1.heartatk, by(region, nototals)

Region
NE              MW              S               W

N                    2,096 (20.2%)   2,774 (26.8%)   2,853 (27.6%)   2,628 (25.4%)
Age (years)        47.816 (17.017) 46.528 (17.376) 48.191 (16.864) 47.838 (17.535)
Weight (kg)        71.646 (14.922) 72.050 (15.340) 72.035 (15.655) 71.787 (15.393)
Sex
Male               1,018 (48.6%)   1,310 (47.2%)   1,332 (46.7%)   1,255 (47.8%)
Female             1,078 (51.4%)   1,464 (52.8%)   1,521 (53.3%)   1,373 (52.2%)
Diabetes status
Diabetic               98 (4.7%)      125 (4.5%)      161 (5.6%)      115 (4.4%)
Prior heart attack
Had heart attack       77 (3.7%)      121 (4.4%)      131 (4.6%)      147 (5.6%)



In fact, I’m interested only in the West and Northeastern regions, but I can’t specify the levels of the by() variable with dtable. The first step is to check which dimensions are being used to lay out the table:

. collect layout

Collection: DTable
Rows: var
Columns: region#result
Table 1: 10 x 4

(output omitted)


I also want to move the sample size to the last row of the table and reorder the regions, so I’m going to list the levels of var and the labels for region to see how to refer to the levels.

. collect levelsof var

Collection: DTable
Dimension: var
Levels: _N _hide age weight 1.sex 2.sex 1.diabetes 1.heartatk

. collect label list region

Collection: DTable
Dimension: region
Label: Region
Level labels:
.m  Total
1  NE
2  MW
3  S
4  W


We can use var[_N] to refer to the sample size and the numeric values of region to specify the regions we want. Below, we lay out our table with the two regions we are interested in, and we place _N after all the variables:

. collect layout (var[age weight i.sex 1.diabetes 1.heartatk _N]) (region[4 1]#result)

Collection: DTable
Rows: var[age weight i.sex 1.diabetes 1.heartatk _N]
Columns: region[4 1]#result
Table 1: 10 x 2

Region
W               NE

Age (years)        47.838 (17.535) 47.816 (17.017)
Weight (kg)        71.787 (15.393) 71.646 (14.922)
Sex
Male               1,255 (47.8%)   1,018 (48.6%)
Female             1,373 (52.2%)   1,078 (51.4%)
Diabetes status
Diabetic              115 (4.4%)       98 (4.7%)
Prior heart attack
Had heart attack      147 (5.6%)       77 (3.7%)
N                    2,628 (25.4%)   2,096 (20.2%)



I specified region 4 before region 1 so that we have the West followed by the Northeast.