**[R] expoisson** -- Exact Poisson regression

__Syntax__

**expoisson** *depvar* *indepvars* [*if*] [*in*] [*weight*] [**,** *options*]

*options* Description
-------------------------------------------------------------------------
Model
__cond__**vars(***varlist***)** condition on variables in *varlist*
__gr__**oup(***varname***)** groups/strata are stratified by unique values of
*varname*
__exp__**osure(***varname_e***)** include ln(*varname_e*) in model with coefficient
constrained to 1
__off__**set(***varname_o***)** include *varname_o* in model with coefficient
constrained to 1

Options
__mem__**ory(***#*[**b**|**k**|**m**|**g**]**)** set limit on memory usage; default is **memory(25m)**
__sav__**ing(***filename***)** save the joint conditional distribution to *filename*

Reporting
__l__**evel(***#***)** set confidence level; default is **level(95)**
**irr** report incidence-rate ratios
__t__**est(***testopt***)** report p-value for observed sufficient statistic,
conditional scores test, or conditional
probabilities test
**mue(***varlist***)** compute the median unbiased estimates for *varlist*
**midp** use the mid-p-value rule
__nolo__**g** do not display the enumeration log
-------------------------------------------------------------------------
**by**, **statsby**, and **xi** are allowed; see prefix.
**fweight**s are allowed; see weight.
See **[R] expoisson postestimation** for features available after estimation.

__Menu__

**Statistics > Exact statistics > Exact Poisson regression**

__Description__

**expoisson** fits an exact Poisson regression model, which produces more
accurate inference in small samples than standard
maximum-likelihood-based Poisson regression. For stratified data,
**expoisson** conditions on the number of events in each stratum and is an
alternative to fixed-effects Poisson regression.

__Options__

+-------+
----+ Model +------------------------------------------------------------

**condvars(***varlist***)** specifies variables whose parameter estimates are not
of interest to you. You can save substantial computer time and
memory by moving such variables from *indepvars* to **condvars()**.
Understand that you will get the same results for **x1** and **x3** whether
you type

**. expoisson y x1 x2 x3 x4**

or

**. expoisson y x1 x3, condvars(x2 x4)**

**group(***varname***)** specifies the variable defining the strata, if any. A
constant term is assumed for each stratum identified in *varname*, and
the sufficient statistics for *indepvars* are conditioned on the
observed number of successes within each group (as well as other
variables in the model). The group variable must be integer valued.

**exposure(***varname_e***)**, **offset(***varname_o***)**; see **[R] estimation options**.

+---------+
----+ Options +----------------------------------------------------------

**memory(***#*[**b**|**k**|**m**|**g**]**)** sets a limit on the amount of memory **expoisson** can use
when computing the conditional distribution of the parameter
sufficient statistics. The default is **memory(25m)**, where **m** stands
for megabyte, or 1,048,576 bytes. The following are also available:
**b** stands for byte; **k** stands for kilobyte, which is equal to 1,024
bytes; and **g** stands for gigabyte, which is equal to 1,024 megabytes.
The minimum setting allowed is **1m** and the maximum is **2048m** or **2g**, but
do not attempt to use more memory than is available on your computer.
Also see the technical note on counting the conditional distribution.

**saving(***filename*[**,** **replace**]**)** saves the joint conditional distribution for
each independent variable specified in *indepvars*. There is one file
for each variable, and it is named using the prefix *filename* with the
variable name appended. For example, **saving(mydata)** with an
independent variable named **X** would generate a data file named
**mydata_X.dta**. Use **replace** to replace an existing file. Each file
contains the conditional distribution for one of the independent
variables specified in *indepvars* conditioned on all other *indepvars*
and those variables specified in **condvars()**. There are two variables
in each data file: the feasible sufficient statistics for the
variable's parameter and their associated weights. The weights
variable is named **_w_**.

+-----------+
----+ Reporting +--------------------------------------------------------

**level(***#***)**; see **[R] estimation options**. The **level(***#***)** option will not work
on replay because confidence intervals are based on
estimator-specific enumerations. To change the confidence level, you
must refit the model.

**irr** reports estimated coefficients transformed to incidence-rate ratios,
that is, exp(b) rather than b. Standard errors and confidence
intervals are similarly transformed. This option affects how results
are displayed, not how they are estimated or stored. **irr** may be
specified at estimation or when replaying previously estimated
results.

**test(**__suff__**icient**|**score**|__pr__**obability)** reports the p-value associated with
the observed sufficient statistic, the conditional scores test, or
the conditional probabilities test. The default is **test(sufficient)**.
All the statistics are computed at estimation time, and each
statistic may be displayed postestimation; see **[R] expoisson**
**postestimation**.

**mue(***varlist***)** specifies that median unbiased estimates (MUEs) be reported
for the variables in *varlist*. By default, the conditional maximum
likelihood estimates (CMLEs) are reported, except for those
parameters for which the CMLEs are infinite. Specify **mue(_all)** if
you want MUEs for all the *indepvars*.

**midp** instructs **expoisson** to use the mid-p-value rule when computing the
MUEs, p-values, and confidence intervals. This adjustment is for the
discreteness of the distribution by halving the value of the discrete
probability of the observed statistic before adding it to the
p-value. The mid-p-value rule cannot be applied to MUEs whose
corresponding parameter CMLE is infinite.

**nolog** prevents the display of the enumeration log. By default, the
enumeration log is displayed, showing the progress of computing the
conditional distribution of the sufficient statistics.

__Technical note__

The **memory(***#***)** option limits the amount of memory that **expoisson** will
consume when computing the conditional distribution of the parameter
sufficient statistics. **memory()** is independent of the data maximum
memory setting (see **set max_memory** in **[D] memory**), and it is possible for
**expoisson** to exceed the memory limit specified in **set max_memory** without
terminating. By default, a log is provided that displays the number of
enumerations (the size of the conditional distribution) after processing
each observation. Typically, you will see the number of enumerations
increase, and then at some point they will decrease as the multivariate
shift algorithm (Hirji, Mehta, and Patel 1987) determines that some of
the enumerations cannot achieve the observed sufficient statistics of the
conditioning variables. When the algorithm is complete, however, it is
necessary to store the conditional distribution of the parameter
sufficient statistics as a dataset. It is possible, therefore, to get a
memory error when the algorithm has completed if there is not enough
memory to store the conditional distribution.

__Examples__

Setup
**. webuse smokes**

Perform exact Poisson regression of **cases** on **smokes** using exposure **peryrs**
**. expoisson cases smokes, exposure(peryrs) irr**

Replay results and report conditional scores test
**. expoisson, test(score) irr**

__Stored results__

**expoisson** stores the following in **e()**:

Scalars
**e(N)** number of observations
**e(k_groups)** number of groups
**e(relative_weight)** relative weight for the observed **e(sufficient)**
and **e(condvars)**
**e(sum_y)** sum of *depvar*
**e(k_indvars)** number of independent variables
**e(k_condvars)** number of conditioning variables
**e(midp)** mid-p-value rule indicator
**e(eps)** relative difference tolerance

Macros
**e(cmd)** **expoisson**
**e(cmdline)** command as typed
**e(title)** title in estimation output
**e(depvar)** name of dependent variable
**e(indvars)** independent variables
**e(condvars)** conditional variables
**e(groupvar)** group variable
**e(exposure)** exposure variable
**e(offset)** linear offset variable
**e(level)** confidence level
**e(wtype)** weight type
**e(wexp)** weight expression
**e(datasignature)** the checksum
**e(datasignaturevars)** variables used in calculation of checksum
**e(properties)** **b V**
**e(estat_cmd)** program used to implement **estat**
**e(marginsnotok)** predictions disallowed by **margins**

Matrices
**e(b)** coefficient vector
**e(mue_indicators)** indicator for elements of **e(b)** estimated using
MUE instead of CMLE
**e(se)** **e(b)** standard errors (CMLEs only)
**e(ci)** matrix of **e(level)** confidence intervals for **e(b)**
**e(sum_y_groups)** sum of **e(depvar)** for each group
**e(N_g)** number of observations in each group
**e(sufficient)** sufficient statistics for **e(b)**
**e(p_sufficient)** p-value for **e(sufficient)**
**e(scoretest)** conditional scores tests for *indepvars*
**e(p_scoretest)** p-value for **e(scoretest)**
**e(probtest)** conditional probability tests for *indepvars*
**e(p_probtest)** p-value for **e(probtest)**

Function
**e(sample)** marks estimation sample

__Reference__

Hirji, K. F., C. R. Mehta, and N. R. Patel. 1987. Computing
distributions for exact logistic regression. *Journal of the American*
*Statistical Association* 82: 1110-1117.