FAQ: How can I remove rows or columns from a table generated with collect, table, or dtable?

Home / Resources & support / FAQs / How can I remove rows or columns from a table generated with collect, table, or dtable?

How can I remove rows or columns from a table generated with collect, table, or dtable?

Title		How can I remove rows or columns from a table generated with collect, table, or dtable?
Author		Gabriela Ortiz, StataCorp

When creating tables of results, you may find that you want to remove certain rows or columns. To do so, you need to know how to refer to the results that are being collected; the key is to identify the dimensions and the levels for the specific results we want. In this FAQ, we will demonstrate how to modify the rows and columns after creating a table with collect, table, and dtable. The FAQ is organized as follows:

Modify rows and columns after creating a table with collect
Modify rows and columns after creating a table with table
Modify rows and columns after creating a table with dtable

Modify rows and columns after creating a table with collect

Suppose we want to create a table of estimation results after fitting a linear regression model. First, we collect the estimation results with the collect prefix, and then we lay out our table:

. collect clear

. webuse nhanes2l, clear
(Second National Health and Nutrition Examination Survey)

. collect: regress bpsystol age weight i.sex 


      Source         SS           df       MS      Number of obs   =    10,351
                        F(3, 10347)     =   1501.75
       Model     1709209.9         3  569736.633    Prob > F        =    0.0000
    Residual    3925460.13    10,347  379.381476    R-squared       =    0.3033
                        Adj R-squared   =    0.3031
       Total    5634670.03    10,350  544.412563    Root MSE        =    19.478




    bpsystol   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
   
         age     .6374325   .0111334    57.25   0.000     .6156088    .6592562
      weight     .4170339    .013474    30.95   0.000     .3906221    .4434456
                                                                              
         sex                                                                  
     Female      .8244702   .4140342     1.99   0.046     .0128832    1.636057
       _cons     70.13615   1.187299    59.07   0.000     67.80881    72.46348



. collect layout (colname) (result)

Collection: default
      Rows: colname
   Columns: result
   Table 1: 5 x 10



               Coefficient       95% CI          df 95% lower bound p-value Std. error 95% upper bound     t   |t| Standardized coefficient
   
Age (years)       .6374325 .6156088  .6592562 10347        .6156088   0.000   .0111334        .6592562 57.25 57.25                 .4702977
Weight (kg)       .4170339 .3906221  .4434456 10347        .3906221   0.000    .013474        .4434456 30.95 30.95                 .2744714
Male                     0                                                           0                                                    0
Female            .8244702 .0128832  1.636057 10347        .0128832   0.046   .4140342        1.636057  1.99  1.99                 .0176462
Intercept         70.13615 67.80881  72.46348 10347        67.80881   0.000   1.187299        72.46348 59.07 59.07

For models with a single outcome variable, we can lay out our table by specifying dimensions colname and result; in this example, we placed levels of colname on the rows and levels of result on the columns of the table. The name colname refers to the column names of the returned matrix e(b), and it identifies the independent variables in our model. The third row in the table corresponds to Male, the base level for sex; we can specify that base levels not be shown with the following setting:

. collect style showbase off

The dimension result identifies the results returned in e() and r(), such as the coefficients, confidence intervals, and model statistics. We have more results than we’d like in our table. Suppose we only want to include the coefficients and confidence intervals. We can list the levels of dimension result to see how to refer to these results:

. collect levelsof result

Collection: default
 Dimension: result
    Levels: F N _r_b _r_ci _r_df _r_lb _r_p _r_se _r_ub _r_z _r_z_abs beta cmd 
            cmdline depvar df_m df_r estat_cmd ll ll_0 marginsok model mss 
	    predict properties r2 r2_a rank rmse rss title vce

It may not be obvious from the result names which ones we want, so instead we list the levels with their labels. It’s good practice to add the all option so that all levels get listed, even those without a label:

. collect label list result, all

  Collection: default
   Dimension: result
       Label: Result
Level labels:
           F  F statistic
           N  Number of observations
        _r_b  Coefficient
       _r_ci  __LEVEL__% CI
       _r_df  df
       _r_lb  __LEVEL__% lower bound
        _r_p  p-value
       _r_se  Std. error
       _r_ub  __LEVEL__% upper bound
        _r_z  t
    _r_z_abs  |t|
        beta  Standardized coefficient
         cmd  Command
     cmdline  Command line as typed
      depvar  Dependent variable

(output omitted)

We now spot _r_b for coefficients and _r_ci for confidence intervals, and we include only these levels of the dimension result.

. collect layout (colname) (result[_r_b _r_ci])

Collection: default
      Rows: colname
   Columns: result[_r_b _r_ci]
   Table 1: 5 x 2



               Coefficient       95% CI      
   
Age (years)       .6374325 .6156088  .6592562
Weight (kg)       .4170339 .3906221  .4434456
Male                     0                   
Female            .8244702 .0128832  1.636057
Intercept         70.13615 67.80881  72.46348

In fact, if you find yourself specifying the same levels of a dimension each time, you can set the levels to be automatically displayed. For example, below we specify that only the coefficients and confidence intervals be displayed whenever we list the dimension result:

. collect style autolevels result _r_b _r_ci, clear

And now if we simply list the dimension result, we’ll only get those two results:

. collect layout (colname) (result)

Collection: default
      Rows: colname
   Columns: result
   Table 1: 4 x 2



               Coefficient       95% CI      
   
Age (years)       .6374325 .6156088  .6592562
Weight (kg)       .4170339 .3906221  .4434456
Female            .8244702 .0128832  1.636057
Intercept         70.13615 67.80881  72.46348

The automatic level specification applies only to this collection, but you can use collect style save to save these style settings and later apply them to another collection. Learn more about automatic levels in FAQ: What are the autolevels of a dimension in a table (collection)? and in the documentation for collect style autolevels.

Modify rows and columns after creating a table with table

When creating tables with the table command, we are simultaneously specifying the statistics we want displayed and the layout of our table. With table as well, we may find that we want to remove rows or columns. For example, consider this table with the percentage of males and females in each age group who have had a heart attack.

. table (agegrp) (sex heartatk), statistic(percent, across(heartatk)) sformat("%s%%") missing



                                                                                         Sex                                                                           
                                 Male                                              Female                                                  Total                       
                          Prior heart attack                                 Prior heart attack                                     Prior heart attack                 
             No heart attack   Had heart attack     Total   No heart attack   Had heart attack       .     Total   No heart attack   Had heart attack       .     Total

Age group                                                                                                                                                              
  20–29              100.00%                      100.00%            99.92%              0.08%           100.00%            99.96%              0.04%           100.00%
  30–39               99.74%              0.26%   100.00%            99.65%              0.23%   0.12%   100.00%            99.69%              0.25%   0.06%   100.00%
  40–49               98.03%              1.97%   100.00%            98.64%              1.21%   0.15%   100.00%            98.35%              1.57%   0.08%   100.00%
  50–59               92.36%              7.64%   100.00%            96.66%              3.34%           100.00%            94.66%              5.34%           100.00%
  60–69               86.56%             13.44%   100.00%            94.57%              5.43%           100.00%            90.73%              9.27%           100.00%
  70+                 83.48%             16.52%   100.00%            92.01%              7.99%           100.00%            88.13%             11.87%           100.00%
  Total               93.53%              6.47%   100.00%            97.06%              2.91%   0.04%   100.00%            95.38%              4.60%   0.02%   100.00%

By default, table will report totals for each category. We specified the missing option to include any observations with a missing value and the sformat() option to display a percent sign next to the percentages.

The table seems too wide for our taste, and we now decide to omit the columns for the totals and missing values. Also, suppose that we’re interested only in people 40 and older. First, we’ll check the current layout specification with collect layout, and then we’ll modify the contents of the table.

. collect layout

Collection: Table
      Rows: agegrp
   Columns: sex#heartatk
    Tables: result
   Table 1: 8 x 11

(output omitted)

We see that the dimensions for the rows and columns correspond to the variables we specified in the parentheses. The dimension result identifies the requested statistics, which in our case is just percent. You can type collect dims to see what other dimensions we have in this collection.

We can select the levels of heartatk and agegrp that we want to include in our table, but we need to know what level refers to the 40s, 50s, and other age groups. Below, we list the levels of the dimensions and their labels, if there are any.

. collect label list sex, all

  Collection: Table
   Dimension: sex
       Label: Sex
Level labels:
          .m  Total
           1  Male
           2  Female

. collect label list agegrp, all

  Collection: Table
   Dimension: agegrp
       Label: Age group
Level labels:
          .m  Total
           1  20–29
           2  30–39
           3  40–49
           4  50–59
           5  60–69
           6  70+

. collect label list heartatk, all

  Collection: Table
   Dimension: heartatk
       Label: Prior heart attack
Level labels:
           .
          .m  Total
           0  No heart attack
           1  Had heart attack

The variable labels are used as the dimension labels; for example, the variable label for agegrp is “Age group”. The levels of each dimension are simply the numeric values, and the labels are obtained from the value labels. The one level you may not recognize is .m, which corresponds to the Total category. The level . is the category for missing values.

Now we know how to specify the levels of sex, agegrp, and heartatk that we want to include. Below, we include the age groups from 40 and above, and we omit the total and missing categories:

. collect layout (agegrp[3 4 5 6]) (sex[1 2]#heartatk[0 1]) (result)

Collection: Table
      Rows: agegrp[3 4 5 6]
   Columns: sex[1 2]#heartatk[0 1]
    Tables: result
   Table 1: 5 x 4



                                               Sex                                  
                            Male                                Female              
                     Prior heart attack                   Prior heart attack        
             No heart attack   Had heart attack   No heart attack   Had heart attack

Age group                                                                           
  40–49               98.03%              1.97%            98.64%              1.21%
  50–59               92.36%              7.64%            96.66%              3.34%
  60–69               86.56%             13.44%            94.57%              5.43%
  70+                 83.48%             16.52%            92.01%              7.99%

Now we have just the categories of interest, but our table header would look better if we hide the dimension labels for sex and heartatk:

. collect style header sex heartatk, title(hide)

. collect preview



                            Male                                Female              
             No heart attack   Had heart attack   No heart attack   Had heart attack

Age group                                                                           
  40–49               98.03%              1.97%            98.64%              1.21%
  50–59               92.36%              7.64%            96.66%              3.34%
  60–69               86.56%             13.44%            94.57%              5.43%
  70+                 83.48%             16.52%            92.01%              7.99%

That’s much better.

Modify rows and columns after creating a table with dtable

When creating tables with the dtable command, we can specify the levels of factor variables that we want to include, but there still may be rows or columns that we wish to omit. For example, below we look at the percentage of individuals who have diabetes and those who have had a heart attack in each region of the United States.

. dtable age weight i.sex i.diabetes i.heartatk, by(region, nototals)



                                                Region                            
                          NE              MW              S               W       

N                    2,096 (20.2%)   2,774 (26.8%)   2,853 (27.6%)   2,628 (25.4%)
Age (years)        47.816 (17.017) 46.528 (17.376) 48.191 (16.864) 47.838 (17.535)
Weight (kg)        71.646 (14.922) 72.050 (15.340) 72.035 (15.655) 71.787 (15.393)
Sex
  Male               1,018 (48.6%)   1,310 (47.2%)   1,332 (46.7%)   1,255 (47.8%)
  Female             1,078 (51.4%)   1,464 (52.8%)   1,521 (53.3%)   1,373 (52.2%)
Diabetes status                                                                   
  Not diabetic       1,997 (95.3%)   2,648 (95.5%)   2,692 (94.4%)   2,513 (95.6%)
  Diabetic               98 (4.7%)      125 (4.5%)      161 (5.6%)      115 (4.4%)
Prior heart attack                                                                
  No heart attack    2,018 (96.3%)   2,652 (95.6%)   2,722 (95.4%)   2,481 (94.4%)
  Had heart attack       77 (3.7%)      121 (4.4%)      131 (4.6%)      147 (5.6%)

I’m interested only in the percent of individuals with a prior heart attack or diabetes, so I omit the other categories below. Both diabetes and heartatk are binary variables, with a value of 1 indicating that the individual has the condition or disease, so we type

. dtable age weight i.sex 1.diabetes 1.heartatk, by(region, nototals)



                                                Region                            
                          NE              MW              S               W       

N                    2,096 (20.2%)   2,774 (26.8%)   2,853 (27.6%)   2,628 (25.4%)
Age (years)        47.816 (17.017) 46.528 (17.376) 48.191 (16.864) 47.838 (17.535)
Weight (kg)        71.646 (14.922) 72.050 (15.340) 72.035 (15.655) 71.787 (15.393)
Sex
  Male               1,018 (48.6%)   1,310 (47.2%)   1,332 (46.7%)   1,255 (47.8%)
  Female             1,078 (51.4%)   1,464 (52.8%)   1,521 (53.3%)   1,373 (52.2%)
Diabetes status                                                                   
  Diabetic               98 (4.7%)      125 (4.5%)      161 (5.6%)      115 (4.4%)
Prior heart attack                                                                
  Had heart attack       77 (3.7%)      121 (4.4%)      131 (4.6%)      147 (5.6%)

In fact, I’m interested only in the West and Northeastern regions, but I can’t specify the levels of the by() variable with dtable. The first step is to check which dimensions are being used to lay out the table:

. collect layout

Collection: DTable
      Rows: var
   Columns: region#result
   Table 1: 10 x 4

(output omitted)

I also want to move the sample size to the last row of the table and reorder the regions, so I’m going to list the levels of var and the labels for region to see how to refer to the levels.

. collect levelsof var

Collection: DTable
 Dimension: var
    Levels: _N _hide age weight 1.sex 2.sex 1.diabetes 1.heartatk

. collect label list region

  Collection: DTable
   Dimension: region
       Label: Region
Level labels:
          .m  Total
           1  NE
           2  MW
           3  S
           4  W

We can use var[_N] to refer to the sample size and the numeric values of region to specify the regions we want. Below, we lay out our table with the two regions we are interested in, and we place _N after all the variables:

. collect layout (var[age weight i.sex 1.diabetes 1.heartatk _N]) (region[4 1]#result)

Collection: DTable
      Rows: var[age weight i.sex 1.diabetes 1.heartatk _N]
   Columns: region[4 1]#result
   Table 1: 10 x 2



                                Region            
                          W               NE      

Age (years)        47.838 (17.535) 47.816 (17.017)
Weight (kg)        71.787 (15.393) 71.646 (14.922)
Sex                                               
  Male               1,255 (47.8%)   1,018 (48.6%)
  Female             1,373 (52.2%)   1,078 (51.4%)
Diabetes status                                   
  Diabetic              115 (4.4%)       98 (4.7%)
Prior heart attack                                
  Had heart attack      147 (5.6%)       77 (3.7%)
N                    2,628 (25.4%)   2,096 (20.2%)

I specified region 4 before region 1 so that we have the West followed by the Northeast.

To learn more about the concepts of a collection and how the collect system works, read [TABLES] Intro 2.

In addition to creating your tables with the collect commands shown above, you can create them with the Tables Builder, shown below. See [TABLES] Tables Builder to learn more.

How can I remove rows or columns from a table generated with collect, table, or dtable?

Modify rows and columns after creating a table with collect

Modify rows and columns after creating a table with table

Modify rows and columns after creating a table with dtable

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Source	SS df MS	Number of obs = 10,351
		F(3, 10347) = 1501.75
Model	1709209.9 3 569736.633	Prob > F = 0.0000
Residual	3925460.13 10,347 379.381476	R-squared = 0.3033
		Adj R-squared = 0.3031
Total	5634670.03 10,350 544.412563	Root MSE = 19.478


bpsystol		Coefficient Std. err. t P>\|t\| [95% conf. interval]

age		.6374325 .0111334 57.25 0.000 .6156088 .6592562
weight		.4170339 .013474 30.95 0.000 .3906221 .4434456

sex
Female		.8244702 .4140342 1.99 0.046 .0128832 1.636057
_cons		70.13615 1.187299 59.07 0.000 67.80881 72.46348


		Coefficient 95% CI df 95% lower bound p-value Std. error 95% upper bound t \|t\| Standardized coefficient

Age (years)		.6374325 .6156088 .6592562 10347 .6156088 0.000 .0111334 .6592562 57.25 57.25 .4702977
Weight (kg)		.4170339 .3906221 .4434456 10347 .3906221 0.000 .013474 .4434456 30.95 30.95 .2744714
Male		0 0 0
Female		.8244702 .0128832 1.636057 10347 .0128832 0.046 .4140342 1.636057 1.99 1.99 .0176462
Intercept		70.13615 67.80881 72.46348 10347 67.80881 0.000 1.187299 72.46348 59.07 59.07


		Coefficient 95% CI

Age (years)		.6374325 .6156088 .6592562
Weight (kg)		.4170339 .3906221 .4434456
Male		0
Female		.8244702 .0128832 1.636057
Intercept		70.13615 67.80881 72.46348


		Sex
		Male Female Total
		Prior heart attack Prior heart attack Prior heart attack
		No heart attack Had heart attack Total No heart attack Had heart attack . Total No heart attack Had heart attack . Total

Age group
20–29		100.00% 100.00% 99.92% 0.08% 100.00% 99.96% 0.04% 100.00%
30–39		99.74% 0.26% 100.00% 99.65% 0.23% 0.12% 100.00% 99.69% 0.25% 0.06% 100.00%
40–49		98.03% 1.97% 100.00% 98.64% 1.21% 0.15% 100.00% 98.35% 1.57% 0.08% 100.00%
50–59		92.36% 7.64% 100.00% 96.66% 3.34% 100.00% 94.66% 5.34% 100.00%
60–69		86.56% 13.44% 100.00% 94.57% 5.43% 100.00% 90.73% 9.27% 100.00%
70+		83.48% 16.52% 100.00% 92.01% 7.99% 100.00% 88.13% 11.87% 100.00%
Total		93.53% 6.47% 100.00% 97.06% 2.91% 0.04% 100.00% 95.38% 4.60% 0.02% 100.00%


		Sex
		Male Female
		Prior heart attack Prior heart attack
		No heart attack Had heart attack No heart attack Had heart attack

Age group
40–49		98.03% 1.97% 98.64% 1.21%
50–59		92.36% 7.64% 96.66% 3.34%
60–69		86.56% 13.44% 94.57% 5.43%
70+		83.48% 16.52% 92.01% 7.99%

Stata/MP4 Annual License (download)

How can I remove rows or columns from a table generated with collect, table, or dtable?

Modify rows and columns after creating a table with collect

Modify rows and columns after creating a table with table

Modify rows and columns after creating a table with dtable

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies