Tables for epidemiologists 



Stata has a set of commands for dealing with 2 x 2 tables, including stratified tables, known collectively as the epitab commands. To calculate appropriate statistics and suppress inappropriate statistics, these commands are organized in the same way that epidemiologists conceptualize data.
Stata’s ir command is used with incidencerate (incidence density or persontime) data; point estimates and confidence intervals for the incidencerate ratio and difference are calculated, along with the attributable or prevented fractions for the exposed and total populations.
Stata’s cs command is used with cohort study data with equal followup time per subject. Risk is then the proportion of subjects who become cases. Point estimates and confidence intervals for the risk difference, risk ratio, and (optionally) the odds ratio are calculated, along with attributable or prevented fractions for the exposed and total population.
Stata’s cc command is used with case–control and crosssectional data. Point estimates and confidence intervals for the odds ratio are calculated along with attributable or prevented fractions for the exposed and total population.
mcc is used with matched case–control data. It calculates McNemar’s chisquared, point estimates, and confidence intervals for the difference, ratio, and relative difference of the proportion with the factor, along with the odds ratio.
All these commands come in two flavors: their normal forms and an “immediate” form. In their normal forms, the commands form counts by summing the dataset in use. In their immediate forms, the data are specified on the command line.
For instance, Boice and Monson (1977 and reprinted in Rothman, Greenland, and Lash 2008, 244) reported on breast cancer cases and personyears of observations for women with tuberculosis who were repeatedly exposed to multiple Xray fluoroscopies and for those not exposed:
Breast cancer cases 41 15 Person years 28,010 19,017 
Using the immediate form of ir, you specify the values in the table following the command:
Exposed Unexposed  Total  
Cases  41 15  56  
Persontime  28010 19017  47027  
Incidence rate  .0014638 .0007888  .0011908  
Point estimate  [95% Conf. Interval]  
Inc. rate diff.  .000675  .0000749 .0012751  
Inc. rate ratio  1.855759  1.005684 3.6093 (exact)  
Attr. frac. ex.  .4611368  .0056519 .722938 (exact)  
Attr. frac. pop  .337618  
(midp) Pr(k>=41) = 0.0177 (exact)  
(midp) 2*Pr(k>=41) = 0.0355 (exact) 
The grander ir command itself can work with individuallevel or aggregate data and also work with stratified data. Rothman, Greenland, and Lash (2008, 264) report results from Doll and Hill (1966) on agespecific coronary disease deaths among British male doctors in relation to cigarette smoking:
Smokers Nonsmokers Age Deaths Personyears Deaths Personyears 
3544 32 52,407 2 18,790 4554 104 43,248 12 10,673 5564 206 28,612 28 5,710 6574 186 12,663 28 2,585 7584 102 5,317 31 1,462 
We have entered these data into Stata:
age smokes deaths pyears  
1.  3544 1 32 52,407  
2.  3544 0 2 18,790  
3.  4554 1 104 43,248  
4.  4554 0 12 10,673  
5.  5564 1 206 28,612  
6.  5564 0 28 5,710  
7.  6574 1 186 12,663  
8.  6574 0 28 2,585  
9.  7584 1 102 5,317  
10.  7584 0 31 1,462  
We can obtain the Mantel–Haenszel combined estimate of the incidencerate ratio, along with 90% confidence intervals, by typing
age  IRR [90% Conf. Interval] MH Weight  
3544  5.736638 1.704271 33.61646 1.472169  (exact)  
4554  2.138812 1.274552 3.813282 9.624747  (exact)  
5564  1.46824 1.044915 2.110422 23.34176  (exact)  
6574  1.35606 .9626026 1.953505 23.25315  (exact)  
7584  .9047304 .6375194 1.305412 24.31435  (exact)  
Crude  1.719823 1.437544 2.0688  (exact)  
MH combined  1.424682 1.194375 1.699399  
Rothman and Greenland (1998, 264) obtain the standardized incidencerate ratio and 90% confidence intervals, weighting each age category by the population of the exposed group, thus producing the standardized mortality ratio (SMR). This calculation can be reproduced by specifying by(age) to indicate that the table is stratified, and istandard to specify that we want the internally standardized rate:
age  IRR [90% Conf. Interval] Weight  
3544  5.736638 1.704271 33.61646 52407  (exact)  
4554  2.138812 1.274552 3.813282 43248  (exact)  
5564  1.46824 1.044915 2.110422 28612  (exact)  
6574  1.35606 .9626026 1.953505 12663  (exact)  
7584  .9047304 .6375194 1.305412 5317  (exact)  
Crude  1.719823 1.437544 2.0688  (exact)  
I. Standardized  1.417609 1.186541 1.693676 
If we want the externally standardized ratio (weights proportional to the population of the unexposed group), we can substitute estandard for istandard in the command above.