help exlogistic dialog: exlogistic
also see: exlogistic postestimation
-------------------------------------------------------------------------------
Title
[R] exlogistic -- Exact logistic regression
Syntax
exlogistic depvar indepvars [if] [in] [weight] [, options]
options description
-------------------------------------------------------------------------
Model
condvars(varlist) condition on variables in varlist
group(varname) groups/strata are stratified by unique values of
varname
binomial(varname|#) data are in binomial form and the number of trials
is contained in varname or in #
estconstant estimate constant term; do not condition on the
number of successes
noconstant suppress constant term
Terms
terms(termsdef) terms definition
Options
memory(#[b|k|m|g]) set limit on memory usage; default is memory(10m)
saving(filename) save the joint conditional distribution to filename
Reporting
level(#) set confidence level; default is level(95)
coef report estimated coefficients
test(testopt) report significance of observed sufficient
statistic, conditional scores test, or
conditional probabilities test
mue(varlist) compute the median unbiased estimates for varlist
midp use the mid-p-value rule
nolog do not display the enumeration log
-------------------------------------------------------------------------
by, statsby, and xi are allowed; see prefix.
fweights are allowed; see weight.
See [R] exlogistic postestimation for features available after
estimation.
Menu
Statistics > Exact statistics > Exact logistic regression
Description
exlogistic fits an exact logistic regression model of depvar on
indepvars.
exlogistic is an alternative to logistic, the standard
maximum-likelihood-based logistic regression estimator; see [R] logistic.
exlogistic produces more-accurate inference in small samples because it
does not depend on asymptotic results and exlogistic can better deal with
one-way causation, such as the case where all females are observed to
have a positive outcome.
exlogistic with the group(varname) option is an alternative to clogit,
the conditional logistic regression estimator; see [R] clogit. Like
clogit, exlogistic conditions on the number of positive outcomes within
stratum.
depvar can be specified in two ways. It can be zero/nonzero, with zero
indicating failure and nonzero representing positive outcomes
(successes), or if you specify the binomial(varname|#) option, depvar may
contain the number of positive outcomes within each trial.
exlogistic is computationally intensive. Unlike most estimators, rather
than calculating coefficients for all independent variables at once,
results for each independent variable are calculated separately with the
other independent variables temporarily conditioned out. You can save
considerable computer time by skipping the parameter calculations for
variables that are not of direct interest. Specify such variables in the
condvars() option rather than among the indepvars; see condvars() below.
Unlike Stata's other estimation commands, you may not use test, lincom,
or other postestimation commands after exlogistic. Given the method used
to calculate estimates, hypothesis tests must be performed during
estimation by using exlogistic's terms() option; terms() below.
Options
+-------+
----+ Model +------------------------------------------------------------
condvars(varlist) specifies variables whose parameter estimates are not
of interest to you. You can save substantial computer time and
memory moving such variables from indepvars to condvars().
Understand that you will get the same results for x1 and x3 whether
you type
. exlogistic y x1 x2 x3 x4
or
. exlogistic y x1 x3, condvars(x2 x4)
group(varname) specifies the variable defining the strata, if any. A
constant term is assumed for each stratum identified in varname, and
the sufficient statistics for indepvars are conditioned on the
observed number of successes within each group. This makes the model
estimated equivalent to that estimated by clogit, Stata's conditional
logistic regression command (see [R] clogit). group() may not be
specified with noconstant or estconstant.
binomial(varname|#) indicates that the data are in binomial form and
depvar contains the number of successes. varname contains the number
of trials for each observation. If all observations have the same
number of trials, you can instead specify the number as an integer.
The number of trials must be a positive integer at least as great as
the number of successes. If binomial() is not specified, the data
are assumed to be Bernoulli, meaning that depvar equaling zero or
nonzero records one failure or success.
estconstant estimates the constant term. By default, the models are
assumed to have an intercept (constant), but the value of the
intercept is not calculated. That is, the conditional distribution
of the sufficient statistics for the indepvars is computed given the
number of successes in depvar, thus conditioning out the constant
term of the model. Use estconstant if you want the estimate of the
intercept reported. estconstant may not be specified with group().
noconstant; see [R] estimation options. noconstant may not be specified
with group().
+-------+
----+ Terms +------------------------------------------------------------
terms(termname = variable ... variable[, termname = variable ... variable
...]) defines additional terms of the model on which you want
exlogistic to perform joint-significance hypothesis tests. By
default, exlogistic reports tests individually on each variable in
indepvars. For instance, if variables x1 and x3 are in indepvars,
and you want to jointly test their significance, specify terms(t1=x1
x3). To also test the joint significance of x2 and x4, specify
terms(t1=x1 x3, t2=x2 x4). Each variable can be assigned to only one
term.
Joint tests are computed only for the conditional scores tests and
the conditional probabilities tests. See test() below.
+---------+
----+ Options +----------------------------------------------------------
memory(#[b|k|m|g]) sets a limit on the amount of memory exlogistic can
use when computing the conditional distribution of the parameter
sufficient statistics. The default is memory(10m), where m stands
for megabyte, or 1,048,576 bytes. The following are also available:
b stands for byte; k stands for kilobyte, which is equal to 1,024
bytes; and g stands for gigabyte, which is equal to 1,024 megabytes.
The minimum setting allowed is 1m and the maximum is 2048m or 2g, but
do not attempt to use more memory than is available on your computer.
Also see Remarks on counting the conditional distribution.
saving(filename[, replace]) saves the joint conditional distribution to
filename. This distribution is conditioned on those variables
specified in condvars(). Use replace to replace an existing file
with filename. A Stata data file is created containing all the
feasible values of the parameter sufficient statistics. The variable
names are the same as those in indepvars, in addition to a variable
named _f_ containing the feasible value frequencies (sometimes
referred to as the condition numbers).
+-----------+
----+ Reporting +--------------------------------------------------------
level(#); see [R] estimation options. The level(#) option will not work
on replay because confidence intervals are based on
estimator-specific enumerations. To change the confidence level, you
must refit the model.
coef reports the estimated coefficients rather than odds ratios
(exponentiated coefficients). coef may be specified when the model
is fit or upon replay. coef affects only how results are displayed
and not how they are estimated.
test(sufficient|score|probability) reports the significance level of the
observed sufficient statistics, the conditional scores tests, or the
conditional probabilities tests, respectively. The default is
test(sufficient). If terms() is included in the specification, the
conditional scores test and the conditional probabilities test are
applied to each term providing conditional inference for several
parameters simultaneously. All the statistics are computed at
estimation time regardless of which is specified. Each statistic may
thus also be displayed postestimation without having to refit the
model; see [R] exlogistic postestimation.
mue(varlist) specifies that median unbiased estimates (MUEs) be reported
for the variables in varlist. By default, the conditional maximum
likelihood estimates (CMLEs) are reported, except for those
parameters for which the CMLEs are infinite. Specify mue(_all) if
you want MUEs for all the indepvars.
midp instructs exlogistic to use the mid-p-value rule when computing the
MUEs, significance levels, and confidence intervals. This adjustment
is for the discreteness of the distribution and halves the value of
the discrete probability of the observed statistic before adding it
to the p-value. The mid-p-value rule cannot be applied to MUEs whose
corresponding parameter CMLE is infinite.
nolog prevents the display of the enumeration log. By default, the
enumeration log is displayed, showing the progress of computing the
conditional distribution of the sufficient statistics.
Remarks
Counting the conditional distribution
The option memory(#) places a limit on the amount of memory that
exlogistic will consume when computing the conditional distribution of
the parameter sufficient statistics. memory() is independent of the
system setting c(memory) (see set memory in [D] memory), and it is
possible for exlogistic to exceed the memory limit specified in c(memory)
without terminating. By default, a log is provided that displays the
number of enumerations (the size of the conditional distribution) after
processing each observation. Typically, you will see the number of
enumerations increase, and then at some point they will decrease as the
multivariate shift algorithm determines that some of the enumerations
cannot achieve the observed sufficient statistics of the conditioning
variables. When the algorithm is complete, however, it is necessary to
store the conditional distribution of the parameter sufficient statistics
as a dataset. It is possible, therefore, to get a memory error when the
algorithm has completed and c(memory) is not large enough to store the
conditional distribution.
Examples
Setup
. webuse hiv1
Perform exact logistic regression of hiv on cd4 and cd8
. exlogistic hiv cd4 cd8
Replay results, but report estimated coefficients rather than odds ratios
. exlogistic, coef
Replay results and report conditional scores test
. exlogistic, test(score)
Saved results
exlogistic saves the following in e():
Scalars
e(N) number of observations
e(k_groups) number of groups
e(n_possible) number of distinct possible outcomes where
sum(sufficient) equals observed e(sufficient)
e(n_trials) binomial number-of-trials parameter
e(sum_y) sum of depvar
e(k_indvars) number of independent variables
e(k_terms) number of model terms
e(k_condvars) number of conditioning variables
e(condcons) conditioned on the constant(s) indicator
e(midp) mid-p-value rule indicator
e(eps) relative difference tolerance
Macros
e(cmd) exlogistic
e(cmdline) command as typed
e(title) Exact logistic regression
e(depvar) dependent variable
e(indvars) independent variables
e(condvars) conditional variables
e(groupvar) group variable
e(binomial) binomial number-of-trials variable
e(level) confidence level
e(wtype) weight type
e(wexp) weight expression
e(datasignature) the checksum
e(datasignaturevars) variables used in calculation of checksum
e(properties) b
e(estat_cmd) program used to implement estat
e(predict) program used to implement predict
e(marginsnotok) predictions disallowed by margins
Matrices
e(b) coefficient vector
e(mue_indicators) indicator for elements of e(b) estimated using
MUE instead of CMLE
e(se) e(b) standard errors (CMLEs only)
e(ci) matrix of e(level) confidence intervals for e(b)
e(sum_y_groups) sum of e(depvar) for each group
e(N_g) number of observations in each group
e(sufficient) sufficient statistics for e(b)
e(p_sufficient) p-value for e(sufficient)
e(scoretest) conditional scores tests for indepvars
e(p_scoretest) p-value for e(scoretest)
e(probtest) conditional probabilities tests for indepvars
e(p_probtest) p-value for e(probtest)
e(scoretest_m) conditional scores tests for model terms
e(p_scoretest_m) p-value for e(scoretest_m)
e(probtest_m) conditional probabilities tests for model terms
e(p_probtest_m) p-value for e(probtest_m)
Function
e(sample) marks estimation sample
Also see
Manual: [R] exlogistic
Help: [R] exlogistic postestimation;
[R] binreg, [R] clogit, [R] expoisson, [R] logistic, [R] logit