Stata's table command can be used to create simple tables for casual use or to create sophisticated tables for publication, especially when combined with the collect suite of commands.
Let's begin by opening the nhanes2l dataset. Then let's describe the variables highbp, age, sex, and hlthstat.
. webuse nhanes2l (Second National Health and Nutrition Examination Survey) . describe highbp age sex hlthstat
Variable Storage Display Value |
name type format label Variable label |
highbp byte %8.0g * High blood pressure |
age byte %9.0g Age (years) |
sex byte %9.0g sex Sex |
hlthstat byte %20.0g hlth Health status |
The table command is usually followed by two sets of parentheses. The first set contains the row variable(s), and the second set contains the column variable(s).
Let's use table to create a table for the row variable highbp.
. table (highbp) ()
Frequency | ||
High blood pressure | ||
0 | 5,975 | |
1 | 4,376 | |
Total | 10,351 | |
Technically, the empty second set of parentheses is not necessary in this example because there are no column variables. But Stata won't complain if we include them.
By default, the table displays the frequencies for each category of highbp along with the total frequency. The categories are not labeled, so let's use label define to create a label named YesNo and use label values to attach the labels to highbp.
. label define YesNo 0 "No" 1 "Yes" . label values highbp YesNo
Now we can use table to re-create the table using our labels.
. table (highbp)
Frequency | ||
High blood pressure | ||
No | 5,975 | |
Yes | 4,376 | |
Total | 10,351 | |
Next let's create a table using highbp as the column variable. Note that the empty first set of parentheses is necessary in this example so that table knows that highbp is in the second set of parentheses.
. table () (highbp)
High blood pressure | ||
No Yes Total | ||
Frequency | 5,975 4,376 10,351 | |
We could use table to create a cross-tabulation of the row variable sex and the column variable highbp.
. table (sex) (highbp)
High blood pressure | ||
No Yes Total | ||
Sex | ||
Male | 2,611 2,304 4,915 | |
Female | 3,364 2,072 5,436 | |
Total | 5,975 4,376 10,351 | |
We can add the nototals option to remove all total frequencies from the rows and columns.
. table (sex) (highbp), nototals
High blood pressure | ||
No Yes | ||
Sex | ||
Male | 2,611 2,304 | |
Female | 3,364 2,072 | |
Then we can use the totals() option to add totals for any row or column variables.
. table (sex) (highbp), totals(highbp)
High blood pressure | ||
No Yes | ||
Sex | ||
Male | 2,611 2,304 | |
Female | 3,364 2,072 | |
Total | 5,975 4,376 | |
High blood pressure | ||
No Yes Total | ||
Sex | ||
Male | 2,611 2,304 4,915 | |
Female | 3,364 2,072 5,436 | |
High blood pressure | ||
No Yes Total | ||
Sex | ||
Male | 2,611 2,304 4,915 | |
Female | 3,364 2,072 5,436 | |
Total | 5,975 4,376 | |
We can include multiple row or column variables (or both row and column variables). The nesting structure is determined by the order of the variables in the parentheses. In the example below, the categories of highbp are nested within each category of sex.
. table (sex highbp) (), totals(highbp)
Frequency | ||
Sex | ||
Male | ||
High blood pressure | ||
No | 2,611 | |
Yes | 2,304 | |
Female | ||
High blood pressure | ||
No | 3,364 | |
Yes | 2,072 | |
Total | ||
High blood pressure | ||
No | 5,975 | |
Yes | 4,376 | |
We can change the order of the row variables, and categories of sex will now be nested within each category of highbp.
. table (highbp sex) (), nototals
Frequency | ||
High blood pressure | ||
No | ||
Sex | ||
Male | 2,611 | |
Female | 3,364 | |
Yes | ||
Sex | ||
Male | 2,304 | |
Female | 2,072 | |
The same idea applies with column variables.
. table () (sex highbp), nototals
Sex | ||
Male Female | ||
High blood pressure High blood pressure | ||
No Yes No Yes | ||
Frequency | 2,611 2,304 3,364 2,072 | |
High blood pressure | ||
No Yes | ||
Sex Sex | ||
Male Female Male Female | ||
Frequency | 2,611 3,364 2,304 2,072 | |
We can even include three, or more, row or column variables (or both).
. table (highbp sex diabetes) (), nototals
Frequency | ||
High blood pressure | ||
No | ||
Sex | ||
Male | ||
Diabetes status | ||
Not diabetic | 2,533 | |
Diabetic | 78 | |
Female | ||
Diabetes status | ||
Not diabetic | 3,262 | |
Diabetic | 100 | |
Yes | ||
Sex | ||
Male | ||
Diabetes status | ||
Not diabetic | 2,165 | |
Diabetic | 139 | |
Female | ||
Diabetes status | ||
Not diabetic | 1,890 | |
Diabetic | 182 | |
The statistic() option adds a specified statistic to each cell of the table defined by the row and column variables. You can type help table##stat to view a list of statistics for statistic().
The example below adds the frequency and the percent to each cell of the table.
. table () (highbp), statistic(frequency) statistic(percent)
High blood pressure | ||
No Yes Total | ||
Frequency | 5,975 4,376 10,351 | |
Percent | 57.72 42.28 100.00 | |
We can add the same statistics for cross-tabulations and use nototals to remove the totals. Note that each cell contains the joint frequencies and percentages, and the Total rows and columns contain the marginal frequencies and percentages.
. table (sex) (highbp), statistic(frequency) statistic(percent)
High blood pressure | ||
No Yes Total | ||
Sex | ||
Male | ||
Frequency | 2,611 2,304 4,915 | |
Percent | 25.22 22.26 47.48 | |
Female | ||
Frequency | 3,364 2,072 5,436 | |
Percent | 32.50 20.02 52.52 | |
Total | ||
Frequency | 5,975 4,376 10,351 | |
Percent | 57.72 42.28 100.00 | |
We can use nototals to remove the row and column totals.
. table (sex) (highbp), statistic(frequency) statistic(percent) nototals
High blood pressure | ||
No Yes | ||
Sex | ||
Male | ||
Frequency | 2,611 2,304 | |
Percent | 25.22 22.26 | |
Female | ||
Frequency | 3,364 2,072 | |
Percent | 32.50 20.02 | |
Next let's use the statistic() option to add the mean and standard deviation of age to each cell.
. table (sex) (highbp), statistic(frequency) statistic(percent) statistic(mean age) statistic(sd age) nototals
High blood pressure | ||
No Yes | ||
Sex | ||
Male | ||
Frequency | 2,611 2,304 | |
Percent | 25.22 22.26 | |
Mean | ||
Age (years) | 42.8625 52.59288 | |
Standard deviation | ||
Age (years) | 16.9688 15.88326 | |
Female | ||
Frequency | 3,364 2,072 | |
Percent | 32.50 20.02 | |
Mean | ||
Age (years) | 41.62366 57.61921 | |
Standard deviation | ||
Age (years) | 16.59921 13.25577 | |
We can specify custom formats for the numbers by using the nformat() option, and we can add strings to the numbers by using the sformat() option.
In the example below, the first nformat() option specifies that frequencies be displayed with no digits to the right of the decimal and with commas in the thousands place. The second nformat() option specifies that the means and standard deviations be displayed with two digits to the right of the decimal.
The first sformat() option specifies that percentages be displayed followed by the % character. The second sformat() option specifies that the standard deviations be surrounded by parentheses.
. table (sex) (highbp), statistic(frequency) statistic(percent) statistic(mean age) statistic(sd age) nototals nformat(%9.0fc frequency) sformat("%s%%" percent) nformat(%6.2f mean sd) sformat("(%s)" sd)
High blood pressure | ||
No Yes | ||
Sex | ||
Male | ||
Frequency | 2,611 2,304 | |
Percent | 25.22% 22.26% | |
Mean | ||
Age (years) | 42.86 52.59 | |
Standard deviation | ||
Age (years) | (16.97) (15.88) | |
Female | ||
Frequency | 3,364 2,072 | |
Percent | 32.50% 20.02% | |
Mean | ||
Age (years) | 41.62 57.62 | |
Standard deviation | ||
Age (years) | (16.60) (13.26) | |
We can use collect to format and export our table to many different document types, including Microsoft Word. You can type help collect export to see a complete list of document types.
The example below uses collect style putdocx to automatically fit the table layout to the document. And collect export exports the table to a Microsoft Word document named mytable1.docx.
. collect style putdocx, layout(autofitcontents)
. collect export mytable1.docx, as(docx) replace
(collection Table exported to file mytable1.docx)
The command() option allows us to populate tables with the results left in memory by other Stata commands. The example below demonstrates how to use the command() option to create a table from a logistic regression model.
Let's begin by typing a logistic regression model for highbp in the command() option. There are no row variables in our table, and we add command and result in the column parentheses.
. table () (command result), command(logistic highbp c.age##i.sex i.hlthstat)
logistic highbp c.age##i.sex i.hlthstat | ||
Age (years) | 1.03228 | |
Sex=Male | 1 | |
Sex=Female | .1510842 | |
Sex=Male # Age (years) | 1 | |
Sex=Female # Age (years) | 1.029162 | |
Health status=Excellent | 1 | |
Health status=Very good | 1.07535 | |
Health status=Good | 1.395254 | |
Health status=Fair | 1.413449 | |
Health status=Poor | 1.366784 | |
Intercept | .160724 | |
Our table only displays the odds ratios from the model. We could tell command() to display other statistics by preceding the command with a list of results. You can type help table##cmdspec to see a complete list of regression results that can be collected by command(). You can also see a detailed list of results that are specific to the command you are using by typing the following collect command:
. collect label list result, all Collection: Table Dimension: result Label: Result Level labels: N Number of observations N_cdf Number of completely determined failures N_cds Number of completely determined successes _r_b Coefficient _r_ci __LEVEL__% CI _r_df df _r_lb __LEVEL__% lower bound _r_p p-value _r_se Std. error _r_ub __LEVEL__% upper bound _r_z z _r_z_abs |z| chi2 χ² chi2type Type of model χ² test cmd Command cmdline Command line as typed converged depvar Dependent variable deriv_useminbound df_m Model DF estat_cmd Program used to implement estat ic Number of iterations k Number of parameters k_dv Number of dependent variables k_eq Number of equations k_eq_model Number of equations in overall model test ll Log likelihood ll_0 Log likelihood, constant-only model marginsnotok Predictions disallowed by margins marginsok Predictions allowed by margins ml_method Type of ml method mns Means of the independent variables opt Optimization type p Model test p-value predict Program used to implement predict properties Command properties r2_p Pseudo R-squared rank Rank of VCE rc Return code rules Rules for perfect predictors technique Maximization technique title Title of output user Likelihood-evaluator program vce SE method which Optimization direction
Let's add the same results that are displayed in the logistic regression output: _r_b (the odds ratio), _r_se (the standard error), _r_z (the test statistic), _r_p (the p-value for the test statistic), and _r_ci (the 95% confidence interval).
. table () (command result), command(_r_b _r_se _r_z _r_p _r_ci : logistic highbp c.age##i.sex i.hlthstat)
logistic highbp c.age##i.sex i.hlthstat | ||
Coefficient Std. error z p-value 95% CI | ||
Age (years) | 1.03228 .0019168 17.11 0.000 1.02853 1.036044 | |
Sex=Male | 1 0 | |
Sex=Female | .1510842 .0218514 -13.07 0.000 .1137913 .2005991 | |
Sex=Male # Age (years) | 1 0 | |
Sex=Female # Age (years) | 1.029162 .0028028 10.55 0.000 1.023683 1.03467 | |
Health status=Excellent | 1 0 | |
Health status=Very good | 1.07535 .0695178 1.12 0.261 .9473767 1.220611 | |
Health status=Good | 1.395254 .087132 5.33 0.000 1.234516 1.576921 | |
Health status=Fair | 1.413449 .1028621 4.75 0.000 1.225561 1.630142 | |
Health status=Poor | 1.366784 .1281915 3.33 0.001 1.137274 1.64261 | |
Intercept | .160724 .0154642 -19.00 0.000 .133101 .1940797 | |
Next we can use nformat() and sformat() to format the numbers in our table. We can also use the cidelimiter() option to specify a delimiter between the lower and upper bounds of the confidence intervals.
. table () (command result), command(_r_b _r_se _r_z _r_p _r_ci : logistic highbp c.age##i.sex i.hlthstat) nformat(%5.2f _r_b _r_se _r_ci ) nformat(%5.4f _r_p) sformat("[%s]" _r_ci ) cidelimiter(" -")
logistic highbp c.age##i.sex i.hlthstat | ||
Coefficient Std. error z p-value 95% CI | ||
Age (years) | 1.03 0.00 17.11 0.0000 [1.03 - 1.04] | |
Sex=Male | 1.00 0.00 | |
Sex=Female | 0.15 0.02 -13.07 0.0000 [0.11 - 0.20] | |
Sex=Male # Age (years) | 1.00 0.00 | |
Sex=Female # Age (years) | 1.03 0.00 10.55 0.0000 [1.02 - 1.03] | |
Health status=Excellent | 1.00 0.00 | |
Health status=Very good | 1.08 0.07 1.12 0.2611 [0.95 - 1.22] | |
Health status=Good | 1.40 0.09 5.33 0.0000 [1.23 - 1.58] | |
Health status=Fair | 1.41 0.10 4.75 0.0000 [1.23 - 1.63] | |
Health status=Poor | 1.37 0.13 3.33 0.0009 [1.14 - 1.64] | |
Intercept | 0.16 0.02 -19.00 0.0000 [0.13 - 0.19] | |
We used table to create our table and specify a few formatting options. But we can use the collect suite of commands to customize many more features. The collect suite of commands is so large that it has its own manual, which you can view by clicking on the links below.
The collect suite creates and modifies collections. The table command automatically created a collection named Table. We can view some of the details by typing collect layout.
. collect layout Collection: Table Rows: colname Columns: command#result Table 1: 11 x 5
logistic highbp c.age##i.sex i.hlthstat | ||
Coefficient Std. error z p-value 95% CI | ||
Age (years) | 1.03 0.00 17.11 0.0000 [1.03 - 1.04] | |
Sex=Male | 1.00 0.00 | |
Sex=Female | 0.15 0.02 -13.07 0.0000 [0.11 - 0.20] | |
Sex=Male # Age (years) | 1.00 0.00 | |
Sex=Female # Age (years) | 1.03 0.00 10.55 0.0000 [1.02 - 1.03] | |
Health status=Excellent | 1.00 0.00 | |
Health status=Very good | 1.08 0.07 1.12 0.2611 [0.95 - 1.22] | |
Health status=Good | 1.40 0.09 5.33 0.0000 [1.23 - 1.58] | |
Health status=Fair | 1.41 0.10 4.75 0.0000 [1.23 - 1.63] | |
Health status=Poor | 1.37 0.13 3.33 0.0009 [1.14 - 1.64] | |
Intercept | 0.16 0.02 -19.00 0.0000 [0.13 - 0.19] | |
The output tells us that the collection is named Table. The rows of our table are defined by a dimension named colname. The columns are defined by the dimensions named command and result. We can see a list of the dimensions in our collection by typing collect dims.
. collect dims Collection dimensions Collection: Table
Dimension No. levels |
Layout, style, header, label cmdset 1 coleq 2 colname 16 colname_remainder 2 command 1 hlthstat 5 program_class 1 result 45 result_type 3 roweq 1 rowname 2 sex 2 statcmd 1 Style only border_block 4 cell_type 4 |
Each dimension has one or more levels, and we can view the levels of a dimension by typing collect levelsof. We can view the levels of the dimension result in the example below.
. collect levelsof result Collection: Table Dimension: result Levels: N N_cdf N_cds _r_b _r_ci _r_df _r_lb _r_p _r_se _r_ub _r_z _r_z_abs chi2 chi2type cmd cmdline converged depvar deriv_useminbound df_m estat_cmd ic k k_dv k_eq k_eq_model ll ll_0 marginsnotok marginsok ml_method mns opt p predict properties r2_p rank rc rules technique title user vce which
Levels can have labels attached to them, and we can view the level labels by typing collect label list.
. collect label list result, all Collection: Table Dimension: result Label: Result Level labels: N Number of observations N_cdf Number of completely determined failures N_cds Number of completely determined successes _r_b Coefficient _r_ci __LEVEL__% CI _r_df df _r_lb __LEVEL__% lower bound _r_p p-value _r_se Std. error _r_ub __LEVEL__% upper bound _r_z z _r_z_abs |z| chi2 χ² chi2type Type of model χ² test cmd Command cmdline Command line as typed converged depvar Dependent variable deriv_useminbound df_m Model DF estat_cmd Program used to implement estat ic Number of iterations k Number of parameters k_dv Number of dependent variables k_eq Number of equations k_eq_model Number of equations in overall model test ll Log likelihood ll_0 Log likelihood, constant-only model marginsnotok Predictions disallowed by margins marginsok Predictions allowed by margins ml_method Type of ml method mns Means of the independent variables opt Optimization type p Model test p-value predict Program used to implement predict properties Command properties r2_p Pseudo R-squared rank Rank of VCE rc Return code rules Rules for perfect predictors technique Maximization technique title Title of output user Likelihood-evaluator program vce SE method which Optimization direction
This jargon may seem confusing at first, but the following analogy may help. A collection is similar to an imaginary dataset stored in memory temporarily. A dimension is similar to a categorical variable in a dataset. The levels of a dimension are similar to the categories of a categorical variable in a dataset. And the level labels are similar to the value labels in a dataset.
We have viewed the dimensions, levels, and labels of our collection, and we can also use the collect commands to modify those levels and labels. Using collections allows us to fully customize our tables without modifying our dataset.
For example, the level _r_b in the dimension result is labeled "Coefficient". Let's use collect label levels to change the label to "Odds ratio". Then we can type collect preview to view the changes to our table.
. collect label levels result _r_b "Odds ratio", modify . collect preview
logistic highbp c.age##i.sex i.hlthstat | ||
Odds ratio Std. error z p-value 95% CI | ||
Age (years) | 1.03 0.00 17.11 0.0000 [1.03 - 1.04] | |
Sex=Male | 1.00 0.00 | |
Sex=Female | 0.15 0.02 -13.07 0.0000 [0.11 - 0.20] | |
Sex=Male # Age (years) | 1.00 0.000 | |
Sex=Female # Age (years) | 1.03 0.00 10.55 0.0000 [1.02 - 1.03] | |
Health status=Excellent | 1.00 0.00 | |
Health status=Very good | 1.08 0.07 1.12 0.2611 [0.95 - 1.22] | |
Health status=Good | 1.40 0.09 5.33 0.0000 [1.23 - 1.58] | |
Health status=Fair | 1.41 0.10 4.75 0.0000 [1.23 - 1.63] | |
Health status=Poor | 1.37 0.13 3.33 0.0009 [1.14 - 1.64] | |
Intercept | 0.16 0.02 -19.00 0.0000 [0.13 - 0.19] | |
The top of our table is labeled with our logistic regression command. Let's view the labels for the dimension command:
. collect label list command, all Collection: Table Dimension: command Label: Command option index Level labels: 1 logistic highbp c.age##i.sex i.hlthstat
The dimension command includes the level 1, which is labeled with our logistic command. Let's again use collect label levels to change this label.
. collect label levels command 1 "Logistic regression model for hypertension", modify . collect preview
Logistic regression model for hypertension | ||
Odds ratio Std. error z p-value 95% CI | ||
Age (years) | 1.03 0.00 17.11 0.0000 [1.03 - 1.04] | |
Sex=Male | 1.00 0.00 | |
Sex=Female | 0.15 0.02 -13.07 0.0000 [0.11 - 0.20] | |
Sex=Male # Age (years) | 1.00 0.000 | |
Sex=Female # Age (years) | 1.03 0.00 10.55 0.0000 [1.02 - 1.03] | |
Health status=Excellent | 1.00 0.00 | |
Health status=Very good | 1.08 0.07 1.12 0.2611 [0.95 - 1.22] | |
Health status=Good | 1.40 0.09 5.33 0.0000 [1.23 - 1.58] | |
Health status=Fair | 1.41 0.10 4.75 0.0000 [1.23 - 1.63] | |
Health status=Poor | 1.37 0.13 3.33 0.0009 [1.14 - 1.64] | |
Intercept | 0.16 0.02 -19.00 0.0000 [0.13 - 0.19] | |
We can also use collect style showbase to remove the factor-variable base level. You can type help fvvarlist to learn more about factor-variable notation.
. collect style showbase off . collect preview
Logistic regression model for hypertension | ||
Odds ratio Std. error z p-value 95% CI | ||
Age (years) | 1.03 0.00 17.11 0.0000 [1.03 - 1.04] | |
Sex=Female | 0.15 0.02 -13.07 0.0000 [0.11 - 0.20] | |
Sex=Female # Age (years) | 1.03 0.00 10.55 0.0000 [1.02 - 1.03] | |
Health status=Very good | 1.08 0.07 1.12 0.2611 [0.95 - 1.22] | |
Health status=Good | 1.40 0.09 5.33 0.0000 [1.23 - 1.58] | |
Health status=Fair | 1.41 0.10 4.75 0.0000 [1.23 - 1.63] | |
Health status=Poor | 1.37 0.13 3.33 0.0009 [1.14 - 1.64] | |
Intercept | 0.16 0.02 -19.00 0.0000 [0.13 - 0.19] | |
Next we can use collect style row to accomplish several tasks. The nobinder option removes the = character between each variable name and value label. The stack option stacks the value labels under each variable name. And the delimiter() option specifies the delimiter for interaction terms.
. collect style row stack, delimiter(" x ") nobinder . collect preview
Logistic regression model for hypertension | ||
Odds ratio Std. error z p-value 95% CI | ||
Age (years) | 1.03 0.00 17.11 0.0000 [1.03 - 1.04] | |
Sex | ||
Female | 0.15 0.02 -13.07 0.0000 [0.11 - 0.20] | |
Sex x Age (years) | ||
Female | 1.03 0.00 10.55 0.0000 [1.02 - 1.03] | |
Health status | ||
Very good | 1.08 0.07 1.12 0.2611 [0.95 - 1.22] | |
Good | 1.40 0.09 5.33 0.0000 [1.23 - 1.58] | |
Fair | 1.41 0.10 4.75 0.0000 [1.23 - 1.63] | |
Poor | 1.37 0.13 3.33 0.0009 [1.14 - 1.64] | |
Intercept | 0.16 0.02 -19.00 0.0000 [0.13 - 0.19] | |
Now we can use collect style putdocx to autofit our table and collect export to export our table to a Microsoft Word document named mytable2.docx.
. collect style putdocx, layout(autofitcontents)
. collect export mytable2.docx, as(docx) replace
(collection Table exported to file mytable2.docx)
We have barely scratched the surface of the things you can do with table and collect. You can read more about these commands in the manual entries linked below, and you can also watch a demonstration of these commands on YouTube by clicking on the link below.
Read more in the Stata Customizable Tables and Collected Results Reference Manual: see [TABLES] Intro, [TABLES] collect style putdocx, [TABLES] collect export, [TABLES] collect clear, [TABLES] collect label, [TABLES] collect preview, [TABLES] collect style showbase, and [TABLES] collect style row. In the Stata Base Reference Manual, see [R] table. In the Stata Data Management Reference Manual, see [D] label. In the Stata User’s Guide, see [U] 11.4.3 Factor variables.