**[R] epitab** -- Tables for epidemiologists (cs and csi)

__Syntax__

**cs** *var_case var_exposed* [*if*] [*in*] [*weight*] [**,** *cs_options*]

**csi** *#a #b #c #d* [**,** *csi_options*]

*cs_options* Description
-------------------------------------------------------------------------
Options
**by(***varlist* [**,** __mis__**sing**]**)** stratify on *varlist*
__es__**tandard** combine external weights with within-stratum
statistics
__is__**tandard** combine internal weights with within-stratum
statistics
__s__**tandard(***varname***)** combine user-specified weights with
within-stratum statistics
__p__**ool** display pooled estimate
__noc__**rude** do not display crude estimate
__noh__**om** do not display homogeneity test
**rd** calculate standardized risk difference
__b__**inomial(***varname***)** number of subjects variable
**or** report odds ratio
__w__**oolf** use Woolf approximation to calculate SE and CI
of the odds ratio
__e__**xact** calculate Fisher's exact p
__l__**evel(***#***)** set confidence level; default is **level(95)**
-------------------------------------------------------------------------

*csi_options* Description
-------------------------------------------------------------------------
**or** report odds ratio
__w__**oolf** use Woolf approximation to calculate SE and CI of
the odds ratio
__e__**xact** calculate Fisher's exact p
__l__**evel(***#***)** set confidence level; default is **level(95)**
-------------------------------------------------------------------------
**fweight**s are allowed; see weight.

__Menu__

__cs__

**Statistics > Epidemiology and related > Tables for epidemiologists >**
**Cohort study risk-ratio etc.**

__csi__

**Statistics > Epidemiology and related > Tables for epidemiologists >**
**Cohort study risk-ratio etc. calculator**

__Description__

**cs** is used with cohort study data with equal follow-up time per subject
and sometimes with cross-sectional data. Risk is then the proportion of
subjects who become cases. It calculates point estimates and confidence
intervals for the risk difference, risk ratio, and (optionally) the odds
ratio, along with attributable or prevented fractions for the exposed and
total population. **csi** is the immediate form of **cs**; see immed. Also see
**[R] logistic** for related commands.

__Options for cs__

+---------+
----+ Options +----------------------------------------------------------

**by(***varlist* [**,** **missing**]**)** specifies that the tables be stratified on
*varlist*. Missing categories in *varlist* are omitted from the
stratified analysis, unless option **missing** is specified within **by()**.
Within-stratum statistics are shown then combined with
Mantel-Haenszel weights. If **estandard**, **istandard**, or **standard()** is
also specified (see below), the weights specified are used in place
of Mantel-Haenszel weights.

**estandard**, **istandard**, and **standard(***varname***)** request that within-stratum
statistics be combined with external, internal, or user-specified
weights to produce a standardized estimate. These options are
mutually exclusive and can be used only when **by()** is also specified.
(When **by()** is specified without one of these options, Mantel-Haenszel
weights are used.)

**estandard** external weights are the total number of unexposed.

**istandard** internal weights are the total number of exposed controls.
**istandard** can be used to produce, among other things, standardized
mortality ratios (SMRs).

**standard(***varname***)** allows user-specified weights. *varname* must
contain a constant within stratum and be nonnegative. The scale of
*varname* is irrelevant.

**pool** specifies that, in a stratified analysis, the directly pooled
estimate also be displayed. The pooled estimate is a weighted
average of the stratum-specific estimates using inverse-variance
weights, which are the inverse of the variance of the
stratum-specific estimate. **pool** is relevant only if **by()** is also
specified.

**nocrude** specifies that in a stratified analysis the crude estimate -- an
estimate obtained without regard to strata -- not be displayed.
**nocrude** is relevant only if **by()** is also specified.

**nohom** specifies that a chi-squared test of homogeneity not be included in
the output of a stratified analysis. This tests whether the exposure
effect is the same across strata and can be performed for any pooled
estimate -- directly pooled or Mantel-Haenszel. **nohom** is relevant
only if **by()** is also specified.

**rd** may be used only with **estandard**, **istandard**, or **standard()**. It
requests that **cs** calculate the standardized risk difference rather
than the default risk ratio.

**binomial(***varname***)** supplies the number of subjects (cases plus controls)
for binomial frequency records. For individual and simple frequency
records, this option is not used.

**or** reports the calculation of the odds ratio in addition to the risk
ratio if **by()** is not specified. With **by()**, **or** specifies that a
Mantel-Haenszel estimate of the combined odds ratio be made rather
than the Mantel-Haenszel estimate of the risk ratio. In either case,
this is the same calculation that would be made by **cc** and **cci**.
Typically, **cc**, **cci**, or **tabodds** is preferred for calculating odds
ratios.

**woolf** requests that the Woolf (1955) approximation, also known as the
Taylor expansion, be used for calculating the standard error and
confidence interval for the odds ratio. By default, **cs** with the **or**
option reports the Cornfield (1956) interval.

**exact** requests that Fisher's exact p be calculated rather than the
chi-squared and its significance level. We recommend specifying
**exact** whenever samples are small. When the least-frequent cell
contains 1,000 cases or more, there will be no appreciable difference
between the exact significance level and the significance level based
on the chi-squared, but the exact significance level will take
considerably longer to calculate. **exact** does not affect whether
exact confidence intervals are calculated. Commands always calculate
exact confidence intervals where they can, unless **cornfield** or **woolf**
is specified.

**level(***#***)** specifies the confidence level, as a percentage, for confidence
intervals. The default is **level(95)** or as set by **set level**.

__Options for csi__

**or** reports the calculation of the odds ratio in addition to the risk
ratio if **by()** is not specified. With **by()**, **or** specifies that a
Mantel-Haenszel estimate of the combined odds ratio be made rather
than the Mantel-Haenszel estimate of the risk ratio. In either case,
this is the same calculation that would be made by **cc** and **cci**.
Typically, **cc**, **cci**, or **tabodds** is preferred for calculating odds
ratios.

**woolf** requests that the Woolf (1955) approximation, also known as the
Taylor expansion, be used for calculating the standard error and
confidence interval for the odds ratio. By default, **csi** with the **or**
option reports the Cornfield (1956) interval.

**exact** requests that Fisher's exact p be calculated rather than the
chi-squared and its significance level. We recommend specifying
**exact** whenever samples are small. When the least-frequent cell
contains 1,000 cases or more, there will be no appreciable difference
between the exact significance level and the significance level based
on the chi-squared, but the exact significance level will take
considerably longer to calculate. **exact** does not affect whether
exact confidence intervals are calculated. Commands always calculate
exact confidence intervals where they can, unless **cornfield** or **woolf**
is specified.

**level(***#***)** specifies the confidence level, as a percentage, for confidence
intervals. The default is **level(95)** or as set by **set level**.

__Examples__

---------------------------------------------------------------------------
Setup
**. webuse csxmpl**

List the data
**. list**

Calculate risk differences, risk ratios, etc.
**. cs case exp [fw=pop]**

Immediate form of above command
**. csi 7 12 9 2**

Same as above, but calculate Fisher's exact p rather than the chi-squared
**. csi 7 12 9 2, exact**

Calculate risk differences, risk ratios, etc., and report the odds ratio
**. cs case exp [fw=pop], or**

---------------------------------------------------------------------------
Setup
**. webuse ugdp**

List the data
**. list**

Perform stratified analysis of cumulative incidence data
**. cs case exposed [fw=pop], by(age)**

Same as above, but report the odds ratio, rather than the risk ratio
**. cs case exposed [fw=pop], by(age) or**

Perform stratified analysis using internally weighted standardized
estimates
**. cs case exposed [fw=pop], by(age) istandard**

Perform stratified analysis using externally weighted standardized
estimates
**. cs case exposed [fw=pop], by(age) estandard**

Create a variable that is always equal to 1
**. generate wgt = 1**

Perform stratified analysis of the standardized risk ratio, weighting
each age category equally
**. cs case exposed [fw=pop], by(age) standard(wgt)**

Perform stratified analysis of the standardized risk difference,
weighting each age category equally
**. cs case exposed [fw=pop], by(age) standard(wgt) rd**
---------------------------------------------------------------------------

__Video example__

Risk ratios calculator

__Stored results__

**cs** and **csi** store the following in **r()**:

Scalars
**r(p)** two-sided p-value
**r(rd)** risk difference
**r(lb_rd)** lower bound of CI for **rd**
**r(ub_rd)** upper bound of CI for **rd**
**r(rr)** risk ratio
**r(lb_rr)** lower bound of CI for **rr**
**r(ub_rr)** upper bound of CI for **rr**
**r(or)** odds ratio
**r(lb_or)** lower bound of CI for **or**
**r(ub_or)** upper bound of CI for **or**
**r(afe)** attributable (prev.) fraction among exposed
**r(lb_afe)** lower bound of CI for **afe**
**r(ub_afe)** upper bound of CI for **afe**
**r(afp)** attributable fraction for the population
**r(crude)** crude estimate (**cs** only)
**r(lb_crude)** lower bound of CI for **crude**
**r(ub_crude)** upper bound of CI for **crude**
**r(pooled)** pooled estimate (**cs** only)
**r(lb_pooled)** lower bound of CI for **pooled**
**r(ub_pooled)** upper bound of CI for **pooled**
**r(chi2_mh)** Mantel-Haenszel heterogeneity chi-squared (**cs** only)
**r(chi2_p)** pooled heterogeneity chi-squared
**r(df)** degrees of freedom (**cs** only)
**r(chi2)** chi-squared
**r(p_exact)** 2-sided Fisher's exact p (**exact** only)
**r(p1_exact)** 1-sided Fisher's exact p (**exact** only)

__References__

Cornfield, J. 1956. A statistical problem arising from retrospective
studies. In Vol. 4 of *Proceedings of the Third Berkeley Symposium*,
ed. J. Neyman, 135-148. Berkeley, CA: University of California
Press.

Woolf, B. 1955. On estimating the relation between blood group disease.
*Annals of Human Genetics* 19: 251-253. Reprinted in *Evolution of*
*Epidemiologic Ideas: Annotated Readings on Concepts and Methods*, ed.
S. Greenland, pp. 108-110. Newton Lower Falls, MA: Epidemiology
Resources.