Home  /  Resources & Support  /  Introduction to Stata basics  /  Creating tables with table

Stata's table command can be used to create simple tables for casual use or to create sophisticated tables for publication, especially when combined with the collect suite of commands.

Let's begin by opening the nhanes2l dataset. Then let's describe the variables highbp, age, sex, and hlthstat.

. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)

. describe highbp age sex hlthstat

Variable Storage Display Value
name type format label Variable label
highbp byte %8.0g * High blood pressure
age byte %9.0g Age (years)
sex byte %9.0g sex Sex
hlthstat byte %20.0g hlth Health status

Basic tables

The table command is usually followed by two sets of parentheses. The first set contains the row variable(s), and the second set contains the column variable(s).

Let's use table to create a table for the row variable highbp.

. table (highbp) ()

Frequency
High blood pressure
0 5,975
1 4,376
Total 10,351

Technically, the empty second set of parentheses is not necessary in this example because there are no column variables. But Stata won't complain if we include them.

By default, the table displays the frequencies for each category of highbp along with the total frequency. The categories are not labeled, so let's use label define to create a label named YesNo and use label values to attach the labels to highbp.

. label define YesNo 0 "No" 1 "Yes"
. label values highbp YesNo

Now we can use table to re-create the table using our labels.

. table (highbp)

Frequency
High blood pressure
No 5,975
Yes 4,376
Total 10,351

Next let's create a table using highbp as the column variable. Note that the empty first set of parentheses is necessary in this example so that table knows that highbp is in the second set of parentheses.

. table () (highbp)

High blood pressure
No Yes Total
Frequency 5,975 4,376 10,351

We could use table to create a cross-tabulation of the row variable sex and the column variable highbp.

. table (sex) (highbp)

High blood pressure
No Yes Total
Sex
Male 2,611 2,304 4,915
Female 3,364 2,072 5,436
Total 5,975 4,376 10,351

We can add the nototals option to remove all total frequencies from the rows and columns.

. table (sex) (highbp), nototals

High blood pressure
No Yes
Sex
Male 2,611 2,304
Female 3,364 2,072

Then we can use the totals() option to add totals for any row or column variables.

. table (sex) (highbp), totals(highbp)

High blood pressure
No Yes
Sex
Male 2,611 2,304
Female 3,364 2,072
Total 5,975 4,376
. table (sex) (highbp), totals(sex)
High blood pressure
No Yes Total
Sex
Male 2,611 2,304 4,915
Female 3,364 2,072 5,436
. table (sex) (highbp), totals(sex highbp)
High blood pressure
No Yes Total
Sex
Male 2,611 2,304 4,915
Female 3,364 2,072 5,436
Total 5,975 4,376

Nested tables

We can include multiple row or column variables (or both row and column variables). The nesting structure is determined by the order of the variables in the parentheses. In the example below, the categories of highbp are nested within each category of sex.

. table (sex highbp) (), totals(highbp)

Frequency
Sex
Male
High blood pressure
No 2,611
Yes 2,304
Female
High blood pressure
No 3,364
Yes 2,072
Total
High blood pressure
No 5,975
Yes 4,376

We can change the order of the row variables, and categories of sex will now be nested within each category of highbp.

. table (highbp sex) (), nototals
Frequency
High blood pressure
No
Sex
Male 2,611
Female 3,364
Yes
Sex
Male 2,304
Female 2,072

The same idea applies with column variables.

. table () (sex highbp), nototals

Sex
Male Female
High blood pressure High blood pressure
No Yes No Yes
Frequency 2,611 2,304 3,364 2,072
. table () (highbp sex), nototals
High blood pressure
No Yes
Sex Sex
Male Female Male Female
Frequency 2,611 3,364 2,304 2,072

We can even include three, or more, row or column variables (or both).

. table (highbp sex diabetes) (), nototals

Frequency
High blood pressure
No
Sex
Male
Diabetes status
Not diabetic 2,533
Diabetic 78
Female
Diabetes status
Not diabetic 3,262
Diabetic 100
Yes
Sex
Male
Diabetes status
Not diabetic 2,165
Diabetic 139
Female
Diabetes status
Not diabetic 1,890
Diabetic 182

Tables of statistics

The statistic() option adds a specified statistic to each cell of the table defined by the row and column variables. You can type help table##stat to view a list of statistics for statistic().

The example below adds the frequency and the percent to each cell of the table.

. table () (highbp),
     statistic(frequency)
     statistic(percent)

High blood pressure
No Yes Total
Frequency 5,975 4,376 10,351
Percent 57.72 42.28 100.00

We can add the same statistics for cross-tabulations and use nototals to remove the totals. Note that each cell contains the joint frequencies and percentages, and the Total rows and columns contain the marginal frequencies and percentages.

. table (sex) (highbp), 
     statistic(frequency) 
     statistic(percent)

High blood pressure
No Yes Total
Sex
Male
Frequency 2,611 2,304 4,915
Percent 25.22 22.26 47.48
Female
Frequency 3,364 2,072 5,436
Percent 32.50 20.02 52.52
Total
Frequency 5,975 4,376 10,351
Percent 57.72 42.28 100.00

We can use nototals to remove the row and column totals.

. table (sex) (highbp), 
      statistic(frequency) 
      statistic(percent)  
      nototals

High blood pressure
No Yes
Sex
Male
Frequency 2,611 2,304
Percent 25.22 22.26
Female
Frequency 3,364 2,072
Percent 32.50 20.02

Next let's use the statistic() option to add the mean and standard deviation of age to each cell.

. table (sex) (highbp), 
      statistic(frequency)
      statistic(percent) 
      statistic(mean age)
      statistic(sd age)
      nototals

High blood pressure
No Yes
Sex
Male
Frequency 2,611 2,304
Percent 25.22 22.26
Mean
Age (years) 42.8625 52.59288
Standard deviation
Age (years) 16.9688 15.88326
Female
Frequency 3,364 2,072
Percent 32.50 20.02
Mean
Age (years) 41.62366 57.61921
Standard deviation
Age (years) 16.59921 13.25577

We can specify custom formats for the numbers by using the nformat() option, and we can add strings to the numbers by using the sformat() option.

In the example below, the first nformat() option specifies that frequencies be displayed with no digits to the right of the decimal and with commas in the thousands place. The second nformat() option specifies that the means and standard deviations be displayed with two digits to the right of the decimal.

The first sformat() option specifies that percentages be displayed followed by the % character. The second sformat() option specifies that the standard deviations be surrounded by parentheses.

. table (sex) (highbp), 
     statistic(frequency) 
     statistic(percent)  
     statistic(mean age) 
     statistic(sd age) 
     nototals   
     nformat(%9.0fc frequency) 
     sformat("%s%%" percent)   
     nformat(%6.2f  mean sd) 
     sformat("(%s)" sd)

High blood pressure
No Yes
Sex
Male
Frequency 2,611 2,304
Percent 25.22% 22.26%
Mean
Age (years) 42.86 52.59
Standard deviation
Age (years) (16.97) (15.88)
Female
Frequency 3,364 2,072
Percent 32.50% 20.02%
Mean
Age (years) 41.62 57.62
Standard deviation
Age (years) (16.60) (13.26)

We can use collect to format and export our table to many different document types, including Microsoft Word. You can type help collect export to see a complete list of document types.

The example below uses collect style putdocx to automatically fit the table layout to the document. And collect export exports the table to a Microsoft Word document named mytable1.docx.

. collect style putdocx, layout(autofitcontents)

. collect export mytable1.docx, as(docx) replace
(collection Table exported to file mytable1.docx)

Tables of command output

The command() option allows us to populate tables with the results left in memory by other Stata commands. The example below demonstrates how to use the command() option to create a table from a logistic regression model.

Let's begin by typing a logistic regression model for highbp in the command() option. There are no row variables in our table, and we add command and result in the column parentheses.

. table () (command result),   
     command(logistic highbp c.age##i.sex i.hlthstat)

logistic highbp c.age##i.sex i.hlthstat
Age (years) 1.03228
Sex=Male 1
Sex=Female .1510842
Sex=Male # Age (years) 1
Sex=Female # Age (years) 1.029162
Health status=Excellent 1
Health status=Very good 1.07535
Health status=Good 1.395254
Health status=Fair 1.413449
Health status=Poor 1.366784
Intercept .160724

Our table only displays the odds ratios from the model. We could tell command() to display other statistics by preceding the command with a list of results. You can type help table##cmdspec to see a complete list of regression results that can be collected by command(). You can also see a detailed list of results that are specific to the command you are using by typing the following collect command:

. collect label list result, all

       Collection: Table
        Dimension: result
            Label: Result
     Level labels:
                N  Number of observations
            N_cdf  Number of completely determined failures
            N_cds  Number of completely determined successes
             _r_b  Coefficient
            _r_ci  __LEVEL__% CI
            _r_df  df
            _r_lb  __LEVEL__% lower bound
             _r_p  p-value
            _r_se  Std. error
            _r_ub  __LEVEL__% upper bound
             _r_z  z
         _r_z_abs  |z|
             chi2  χ²
         chi2type  Type of model χ² test
              cmd  Command
          cmdline  Command line as typed
        converged
           depvar  Dependent variable
deriv_useminbound
             df_m  Model DF
        estat_cmd  Program used to implement estat
               ic  Number of iterations
                k  Number of parameters
             k_dv  Number of dependent variables
             k_eq  Number of equations
       k_eq_model  Number of equations in overall model test
               ll  Log likelihood
             ll_0  Log likelihood, constant-only model
     marginsnotok  Predictions disallowed by margins
        marginsok  Predictions allowed by margins
        ml_method  Type of ml method
              mns  Means of the independent variables
              opt  Optimization type
                p  Model test p-value
          predict  Program used to implement predict
       properties  Command properties
             r2_p  Pseudo R-squared
             rank  Rank of VCE
               rc  Return code
            rules  Rules for perfect predictors
        technique  Maximization technique
            title  Title of output
             user  Likelihood-evaluator program
              vce  SE method
            which  Optimization direction

Let's add the same results that are displayed in the logistic regression output: _r_b (the odds ratio), _r_se (the standard error), _r_z (the test statistic), _r_p (the p-value for the test statistic), and _r_ci (the 95% confidence interval).

. table () (command result),       
     command(_r_b _r_se _r_z _r_p _r_ci 
	     : logistic highbp c.age##i.sex i.hlthstat)

logistic highbp c.age##i.sex i.hlthstat
Coefficient Std. error z p-value 95% CI
Age (years) 1.03228 .0019168 17.11 0.000 1.02853 1.036044
Sex=Male 1 0
Sex=Female .1510842 .0218514 -13.07 0.000 .1137913 .2005991
Sex=Male # Age (years) 1 0
Sex=Female # Age (years) 1.029162 .0028028 10.55 0.000 1.023683 1.03467
Health status=Excellent 1 0
Health status=Very good 1.07535 .0695178 1.12 0.261 .9473767 1.220611
Health status=Good 1.395254 .087132 5.33 0.000 1.234516 1.576921
Health status=Fair 1.413449 .1028621 4.75 0.000 1.225561 1.630142
Health status=Poor 1.366784 .1281915 3.33 0.001 1.137274 1.64261
Intercept .160724 .0154642 -19.00 0.000 .133101 .1940797

Next we can use nformat() and sformat() to format the numbers in our table. We can also use the cidelimiter() option to specify a delimiter between the lower and upper bounds of the confidence intervals.

. table () (command result),  
     command(_r_b _r_se _r_z _r_p _r_ci     
             : logistic highbp c.age##i.sex i.hlthstat) 
     nformat(%5.2f  _r_b _r_se _r_ci )       
     nformat(%5.4f  _r_p)                  
     sformat("[%s]" _r_ci )          
     cidelimiter(" -")

logistic highbp c.age##i.sex i.hlthstat
Coefficient Std. error z p-value 95% CI
Age (years) 1.03 0.00 17.11 0.0000 [1.03 - 1.04]
Sex=Male 1.00 0.00
Sex=Female 0.15 0.02 -13.07 0.0000 [0.11 - 0.20]
Sex=Male # Age (years) 1.00 0.00
Sex=Female # Age (years) 1.03 0.00 10.55 0.0000 [1.02 - 1.03]
Health status=Excellent 1.00 0.00
Health status=Very good 1.08 0.07 1.12 0.2611 [0.95 - 1.22]
Health status=Good 1.40 0.09 5.33 0.0000 [1.23 - 1.58]
Health status=Fair 1.41 0.10 4.75 0.0000 [1.23 - 1.63]
Health status=Poor 1.37 0.13 3.33 0.0009 [1.14 - 1.64]
Intercept 0.16 0.02 -19.00 0.0000 [0.13 - 0.19]

We used table to create our table and specify a few formatting options. But we can use the collect suite of commands to customize many more features. The collect suite of commands is so large that it has its own manual, which you can view by clicking on the links below.

The collect suite creates and modifies collections. The table command automatically created a collection named Table. We can view some of the details by typing collect layout.

. collect layout

Collection: Table
      Rows: colname
   Columns: command#result
   Table 1: 11 x 5

logistic highbp c.age##i.sex i.hlthstat
Coefficient Std. error z p-value 95% CI
Age (years) 1.03 0.00 17.11 0.0000 [1.03 - 1.04]
Sex=Male 1.00 0.00
Sex=Female 0.15 0.02 -13.07 0.0000 [0.11 - 0.20]
Sex=Male # Age (years) 1.00 0.00
Sex=Female # Age (years) 1.03 0.00 10.55 0.0000 [1.02 - 1.03]
Health status=Excellent 1.00 0.00
Health status=Very good 1.08 0.07 1.12 0.2611 [0.95 - 1.22]
Health status=Good 1.40 0.09 5.33 0.0000 [1.23 - 1.58]
Health status=Fair 1.41 0.10 4.75 0.0000 [1.23 - 1.63]
Health status=Poor 1.37 0.13 3.33 0.0009 [1.14 - 1.64]
Intercept 0.16 0.02 -19.00 0.0000 [0.13 - 0.19]

The output tells us that the collection is named Table. The rows of our table are defined by a dimension named colname. The columns are defined by the dimensions named command and result. We can see a list of the dimensions in our collection by typing collect dims.

. collect dims

Collection dimensions
Collection: Table
Dimension No. levels
Layout, style, header, label cmdset 1 coleq 2 colname 16 colname_remainder 2 command 1 hlthstat 5 program_class 1 result 45 result_type 3 roweq 1 rowname 2 sex 2 statcmd 1 Style only border_block 4 cell_type 4

Each dimension has one or more levels, and we can view the levels of a dimension by typing collect levelsof. We can view the levels of the dimension result in the example below.

. collect levelsof result

Collection: Table
 Dimension: result
    Levels: N N_cdf N_cds _r_b _r_ci _r_df _r_lb _r_p _r_se _r_ub _r_z _r_z_abs chi2 chi2type cmd
            cmdline converged depvar deriv_useminbound df_m estat_cmd ic k k_dv k_eq k_eq_model ll
            ll_0 marginsnotok marginsok ml_method mns opt p predict properties r2_p rank rc rules
            technique title user vce which

Levels can have labels attached to them, and we can view the level labels by typing collect label list.

. collect label list result, all

       Collection: Table
        Dimension: result
            Label: Result
     Level labels:
                N  Number of observations
            N_cdf  Number of completely determined failures
            N_cds  Number of completely determined successes
             _r_b  Coefficient
            _r_ci  __LEVEL__% CI
            _r_df  df
            _r_lb  __LEVEL__% lower bound
             _r_p  p-value
            _r_se  Std. error
            _r_ub  __LEVEL__% upper bound
             _r_z  z
         _r_z_abs  |z|
             chi2  χ²
         chi2type  Type of model χ² test
              cmd  Command
          cmdline  Command line as typed
        converged  
           depvar  Dependent variable
deriv_useminbound  
             df_m  Model DF
        estat_cmd  Program used to implement estat
               ic  Number of iterations
                k  Number of parameters
             k_dv  Number of dependent variables
             k_eq  Number of equations
       k_eq_model  Number of equations in overall model test
               ll  Log likelihood
             ll_0  Log likelihood, constant-only model
     marginsnotok  Predictions disallowed by margins
        marginsok  Predictions allowed by margins
        ml_method  Type of ml method
              mns  Means of the independent variables
              opt  Optimization type
                p  Model test p-value
          predict  Program used to implement predict
       properties  Command properties
             r2_p  Pseudo R-squared
             rank  Rank of VCE
               rc  Return code
            rules  Rules for perfect predictors
        technique  Maximization technique
            title  Title of output
             user  Likelihood-evaluator program
              vce  SE method
            which  Optimization direction

This jargon may seem confusing at first, but the following analogy may help. A collection is similar to an imaginary dataset stored in memory temporarily. A dimension is similar to a categorical variable in a dataset. The levels of a dimension are similar to the categories of a categorical variable in a dataset. And the level labels are similar to the value labels in a dataset.

We have viewed the dimensions, levels, and labels of our collection, and we can also use the collect commands to modify those levels and labels. Using collections allows us to fully customize our tables without modifying our dataset.

For example, the level _r_b in the dimension result is labeled "Coefficient". Let's use collect label levels to change the label to "Odds ratio". Then we can type collect preview to view the changes to our table.

. collect label levels result _r_b "Odds ratio", modify
. collect preview

logistic highbp c.age##i.sex i.hlthstat
Odds ratio Std. error z p-value 95% CI
Age (years) 1.03 0.00 17.11 0.0000 [1.03 - 1.04]
Sex=Male 1.00 0.00
Sex=Female 0.15 0.02 -13.07 0.0000 [0.11 - 0.20]
Sex=Male # Age (years) 1.00 0.000
Sex=Female # Age (years) 1.03 0.00 10.55 0.0000 [1.02 - 1.03]
Health status=Excellent 1.00 0.00
Health status=Very good 1.08 0.07 1.12 0.2611 [0.95 - 1.22]
Health status=Good 1.40 0.09 5.33 0.0000 [1.23 - 1.58]
Health status=Fair 1.41 0.10 4.75 0.0000 [1.23 - 1.63]
Health status=Poor 1.37 0.13 3.33 0.0009 [1.14 - 1.64]
Intercept 0.16 0.02 -19.00 0.0000 [0.13 - 0.19]

The top of our table is labeled with our logistic regression command. Let's view the labels for the dimension command:

. collect label list command, all

  Collection: Table
   Dimension: command
       Label: Command option index
Level labels:
           1  logistic highbp c.age##i.sex i.hlthstat

The dimension command includes the level 1, which is labeled with our logistic command. Let's again use collect label levels to change this label.

. collect label levels command 1 "Logistic regression model for hypertension", modify
. collect preview

Logistic regression model for hypertension
Odds ratio Std. error z p-value 95% CI
Age (years) 1.03 0.00 17.11 0.0000 [1.03 - 1.04]
Sex=Male 1.00 0.00
Sex=Female 0.15 0.02 -13.07 0.0000 [0.11 - 0.20]
Sex=Male # Age (years) 1.00 0.000
Sex=Female # Age (years) 1.03 0.00 10.55 0.0000 [1.02 - 1.03]
Health status=Excellent 1.00 0.00
Health status=Very good 1.08 0.07 1.12 0.2611 [0.95 - 1.22]
Health status=Good 1.40 0.09 5.33 0.0000 [1.23 - 1.58]
Health status=Fair 1.41 0.10 4.75 0.0000 [1.23 - 1.63]
Health status=Poor 1.37 0.13 3.33 0.0009 [1.14 - 1.64]
Intercept 0.16 0.02 -19.00 0.0000 [0.13 - 0.19]

We can also use collect style showbase to remove the factor-variable base level. You can type help fvvarlist to learn more about factor-variable notation.

. collect style showbase off
. collect preview

Logistic regression model for hypertension
Odds ratio Std. error z p-value 95% CI
Age (years) 1.03 0.00 17.11 0.0000 [1.03 - 1.04]
Sex=Female 0.15 0.02 -13.07 0.0000 [0.11 - 0.20]
Sex=Female # Age (years) 1.03 0.00 10.55 0.0000 [1.02 - 1.03]
Health status=Very good 1.08 0.07 1.12 0.2611 [0.95 - 1.22]
Health status=Good 1.40 0.09 5.33 0.0000 [1.23 - 1.58]
Health status=Fair 1.41 0.10 4.75 0.0000 [1.23 - 1.63]
Health status=Poor 1.37 0.13 3.33 0.0009 [1.14 - 1.64]
Intercept 0.16 0.02 -19.00 0.0000 [0.13 - 0.19]

Next we can use collect style row to accomplish several tasks. The nobinder option removes the = character between each variable name and value label. The stack option stacks the value labels under each variable name. And the delimiter() option specifies the delimiter for interaction terms.

. collect style row stack, delimiter(" x ") nobinder
. collect preview

Logistic regression model for hypertension
Odds ratio Std. error z p-value 95% CI
Age (years) 1.03 0.00 17.11 0.0000 [1.03 - 1.04]
Sex
Female 0.15 0.02 -13.07 0.0000 [0.11 - 0.20]
Sex x Age (years)
Female 1.03 0.00 10.55 0.0000 [1.02 - 1.03]
Health status
Very good 1.08 0.07 1.12 0.2611 [0.95 - 1.22]
Good 1.40 0.09 5.33 0.0000 [1.23 - 1.58]
Fair 1.41 0.10 4.75 0.0000 [1.23 - 1.63]
Poor 1.37 0.13 3.33 0.0009 [1.14 - 1.64]
Intercept 0.16 0.02 -19.00 0.0000 [0.13 - 0.19]

Now we can use collect style putdocx to autofit our table and collect export to export our table to a Microsoft Word document named mytable2.docx.

. collect style putdocx, layout(autofitcontents)
. collect export mytable2.docx, as(docx) replace
(collection Table exported to file mytable2.docx)

We have barely scratched the surface of the things you can do with table and collect. You can read more about these commands in the manual entries linked below, and you can also watch a demonstration of these commands on YouTube by clicking on the link below.

See it in action

Watch New in Stata 17: Customizable tables.