**[R] exlogistic** -- Exact logistic regression

__Syntax__

**exlogistic** *depvar* *indepvars* [*if*] [*in*] [*weight*] [**,** *options*]

*depvar* can be specified as a zero or nonzero variable or the number of
positive outcomes within each trial. For a zero or nonzero variable,
zero indicates failure and nonzero indicates success. To specify
*depvar* as the number of positive outcomes, you must also specify
**binomial(***varname*|*#***)**.

*options* Description
-------------------------------------------------------------------------
Model
__cond__**vars(***varlist***)** condition on variables in *varlist*
__gr__**oup(***varname***)** groups/strata are stratified by unique values of
*varname*
__bin__**omial(***varname*|*#***)** data are in binomial form and the number of trials
is contained in *varname* or in *#*
__estc__**onstant** estimate constant term; do not condition on the
number of successes
__nocons__**tant** suppress constant term

Terms
__term__**s(***termsdef***)** terms definition

Options
__mem__**ory(***#*[**b**|**k**|**m**|**g**]**)** set limit on memory usage; default is **memory(10m)**
__sav__**ing(***filename***)** save the joint conditional distribution to *filename*

Reporting
__l__**evel(***#***)** set confidence level; default is **level(95)**
**coef** report estimated coefficients
__t__**est(***testopt***)** report p-value for observed sufficient statistic,
conditional scores test, or conditional
probabilities test
**mue(***varlist***)** compute the median unbiased estimates for *varlist*
**midp** use the mid-p-value rule
__nolo__**g** do not display the enumeration log
-------------------------------------------------------------------------
**by**, **statsby**, and **xi** are allowed; see prefix.
**fweight**s are allowed; see weight.
See **[R] exlogistic postestimation** for features available after
estimation.

__Menu__

**Statistics > Exact statistics > Exact logistic regression**

__Description__

**exlogistic** fits an exact logistic regression model, which produces more
accurate inference in small samples than the standard
maximum-likelihood-based logistic regression estimator. It can also
better deal with completely determined outcomes. **exlogistic** with the
**group(***varname***)** option conditions on the number of positive outcomes
within stratum and is an alternative to the conditional (fixed-effects)
logistic regression estimator.

Unlike Stata's other estimation commands, **exlogistic** must perform
hypothesis tests during estimation rather than during postestimation with
standard postestimation commands.

__Options__

+-------+
----+ Model +------------------------------------------------------------

**condvars(***varlist***)** specifies variables whose parameter estimates are not
of interest to you. You can save substantial computer time and
memory moving such variables from *indepvars* to **condvars()**.
Understand that you will get the same results for **x1** and **x3** whether
you type

**. exlogistic y x1 x2 x3 x4**

or

**. exlogistic y x1 x3, condvars(x2 x4)**

**group(***varname***)** specifies the variable defining the strata, if any. A
constant term is assumed for each stratum identified in *varname*, and
the sufficient statistics for *indepvars* are conditioned on the
observed number of successes within each group. This makes the model
estimated equivalent to that estimated by **clogit**, Stata's conditional
logistic regression command (see **[R] clogit**). **group()** may not be
specified with **noconstant** or **estconstant**.

**binomial(***varname*|*#***)** indicates that the data are in binomial form and
*depvar* contains the number of successes. *varname* contains the number
of trials for each observation. If all observations have the same
number of trials, you can instead specify the number as an integer.
The number of trials must be a positive integer at least as great as
the number of successes. If **binomial()** is not specified, the data
are assumed to be Bernoulli, meaning that *depvar* equaling zero or
nonzero records one failure or success.

**estconstant** estimates the constant term. By default, the models are
assumed to have an intercept (constant), but the value of the
intercept is not calculated. That is, the conditional distribution
of the sufficient statistics for the *indepvars* is computed given the
number of successes in *depvar*, thus conditioning out the constant
term of the model. Use **estconstant** if you want the estimate of the
intercept reported. **estconstant** may not be specified with **group()**.

**noconstant**; see **[R] estimation options**. **noconstant** may not be specified
with **group()**.

+-------+
----+ Terms +------------------------------------------------------------

**terms(***termname* **=** *variable* ... *variable*[**,** *termname* **=** *variable* ... *variable*
...]**)** defines additional terms of the model on which you want
**exlogistic** to perform joint-significance hypothesis tests. By
default, **exlogistic** reports tests individually on each variable in
*indepvars*. For instance, if variables **x1** and **x3** are in *indepvars*,
and you want to jointly test their significance, specify **terms(t1=x1**
**x3)**. To also test the joint significance of **x2** and **x4**, specify
**terms(t1=x1 x3, t2=x2 x4)**. Each variable can be assigned to only one
term.

Joint tests are computed only for the conditional scores tests and
the conditional probabilities tests. See **test()** below.

+---------+
----+ Options +----------------------------------------------------------

**memory(***#*[**b**|**k**|**m**|**g**]**)** sets a limit on the amount of memory **exlogistic** can
use when computing the conditional distribution of the parameter
sufficient statistics. The default is **memory(10m)**, where **m** stands
for megabyte, or 1,048,576 bytes. The following are also available:
**b** stands for byte; **k** stands for kilobyte, which is equal to 1,024
bytes; and **g** stands for gigabyte, which is equal to 1,024 megabytes.
The minimum setting allowed is **1m** and the maximum is **2048m** or **2g**, but
do not attempt to use more memory than is available on your computer.
Also see the technical note on counting the conditional distribution.

**saving(***filename*[**,** **replace**]**)** saves the joint conditional distribution to
*filename*. This distribution is conditioned on those variables
specified in **condvars()**. Use **replace** to replace an existing file
with *filename*. A Stata data file is created containing all the
feasible values of the parameter sufficient statistics. The variable
names are the same as those in *indepvars*, in addition to a variable
named **_f_** containing the feasible value frequencies (sometimes
referred to as the condition numbers).

+-----------+
----+ Reporting +--------------------------------------------------------

**level(***#***)**; see **[R] estimation options**. The **level(***#***)** option will not work
on replay because confidence intervals are based on
estimator-specific enumerations. To change the confidence level, you
must refit the model.

**coef** reports the estimated coefficients rather than odds ratios
(exponentiated coefficients). **coef** may be specified when the model
is fit or upon replay. **coef** affects only how results are displayed
and not how they are estimated.

**test(**__suff__**icient**|**score**|__p__**robability)** reports the p-value associated with
the observed sufficient statistics, the conditional scores tests, or
the conditional probabilities tests, respectively. The default is
**test(sufficient)**. If **terms()** is included in the specification, the
conditional scores test and the conditional probabilities test are
applied to each term providing conditional inference for several
parameters simultaneously. All the statistics are computed at
estimation time regardless of which is specified. Each statistic may
thus also be displayed postestimation without having to refit the
model; see **[R] exlogistic postestimation**.

**mue(***varlist***)** specifies that median unbiased estimates (MUEs) be reported
for the variables in *varlist*. By default, the conditional maximum
likelihood estimates (CMLEs) are reported, except for those
parameters for which the CMLEs are infinite. Specify **mue(_all)** if
you want MUEs for all the *indepvars*.

**midp** instructs **exlogistic** to use the mid-p-value rule when computing the
MUEs, p-values, and confidence intervals. This adjustment is for the
discreteness of the distribution and halves the value of the discrete
probability of the observed statistic before adding it to the
p-value. The mid-p-value rule cannot be applied to MUEs whose
corresponding parameter CMLE is infinite.

**nolog** prevents the display of the enumeration log. By default, the
enumeration log is displayed, showing the progress of computing the
conditional distribution of the sufficient statistics.

__Technical note__

The **memory(***#***)** option limits the amount of memory that **exlogistic** will
consume when computing the conditional distribution of the parameter
sufficient statistics. **memory()** is independent of the data maximum
memory setting (see **set max_memory** in **[D] memory**), and it is possible for
**exlogistic** to exceed the memory limit specified in **set max_memory** without
terminating. By default, a log is provided that displays the number of
enumerations (the size of the conditional distribution) after processing
each observation. Typically, you will see the number of enumerations
increase, and then at some point they will decrease as the multivariate
shift algorithm (Hirji, Mehta, and Patel 1987) determines that some of
the enumerations cannot achieve the observed sufficient statistics of the
conditioning variables. When the algorithm is complete, however, it is
necessary to store the conditional distribution of the parameter
sufficient statistics as a dataset. It is possible, therefore, to get a
memory error when the algorithm has completed if there is not enough
memory to store the conditional distribution.

__Examples__

Setup
**. webuse hiv1**

Perform exact logistic regression of **hiv** on **cd4** and **cd8**
**. exlogistic hiv cd4 cd8**

Replay results, but report estimated coefficients rather than odds ratios
**. exlogistic, coef**

Replay results and report conditional scores test
**. exlogistic, test(score)**

__Stored results__

**exlogistic** stores the following in **e()**:

Scalars
**e(N)** number of observations
**e(k_groups)** number of groups
**e(n_possible)** number of distinct possible outcomes where
**sum(sufficient)** equals observed **e(sufficient)**
**e(n_trials)** binomial number-of-trials parameter
**e(sum_y)** sum of *depvar*
**e(k_indvars)** number of independent variables
**e(k_terms)** number of model terms
**e(k_condvars)** number of conditioning variables
**e(condcons)** conditioned on the constant(s) indicator
**e(midp)** mid-p-value rule indicator
**e(eps)** relative difference tolerance

Macros
**e(cmd)** **exlogistic**
**e(cmdline)** command as typed
**e(title)** title in estimation output
**e(depvar)** name of dependent variable
**e(indvars)** independent variables
**e(condvars)** conditional variables
**e(groupvar)** group variable
**e(binomial)** binomial number-of-trials variable
**e(terms)** term names set in option **terms()**
**e(level)** confidence level
**e(wtype)** weight type
**e(wexp)** weight expression
**e(datasignature)** the checksum
**e(datasignaturevars)** variables used in calculation of checksum
**e(properties)** **b**
**e(estat_cmd)** program used to implement **estat**
**e(marginsnotok)** predictions disallowed by **margins**

Matrices
**e(b)** coefficient vector
**e(mue_indicators)** indicator for elements of **e(b)** estimated using
MUE instead of CMLE
**e(se)** **e(b)** standard errors (CMLEs only)
**e(ci)** matrix of **e(level)** confidence intervals for **e(b)**
**e(sum_y_groups)** sum of **e(depvar)** for each group
**e(N_g)** number of observations in each group
**e(sufficient)** sufficient statistics for **e(b)**
**e(p_sufficient)** p-value for **e(sufficient)**
**e(scoretest)** conditional scores tests for *indepvars*
**e(p_scoretest)** p-value for **e(scoretest)**
**e(probtest)** conditional probabilities tests for *indepvars*
**e(p_probtest)** p-value for **e(probtest)**
**e(scoretest_m)** conditional scores tests for model terms
**e(p_scoretest_m)** p-value for **e(scoretest_m)**
**e(probtest_m)** conditional probabilities tests for model terms
**e(p_probtest_m)** p-value for **e(probtest_m)**

Function
**e(sample)** marks estimation sample

__Reference__

Hirji, K. F., C. R. Mehta, and N. R. Patel. 1987. Computing
distributions for exact logistic regression. *Journal of the American*
*Statistical Association* 82: 1110-1117.