Stata 15 help for exlogistic

[R] exlogistic -- Exact logistic regression


exlogistic depvar indepvars [if] [in] [weight] [, options]

depvar can be specified as a zero or nonzero variable or the number of positive outcomes within each trial. For a zero or nonzero variable, zero indicates failure and nonzero indicates success. To specify depvar as the number of positive outcomes, you must also specify binomial(varname|#).

options Description ------------------------------------------------------------------------- Model condvars(varlist) condition on variables in varlist group(varname) groups/strata are stratified by unique values of varname binomial(varname|#) data are in binomial form and the number of trials is contained in varname or in # estconstant estimate constant term; do not condition on the number of successes noconstant suppress constant term

Terms terms(termsdef) terms definition

Options memory(#[b|k|m|g]) set limit on memory usage; default is memory(10m) saving(filename) save the joint conditional distribution to filename

Reporting level(#) set confidence level; default is level(95) coef report estimated coefficients test(testopt) report p-value for observed sufficient statistic, conditional scores test, or conditional probabilities test mue(varlist) compute the median unbiased estimates for varlist midp use the mid-p-value rule nolog do not display the enumeration log ------------------------------------------------------------------------- by, statsby, and xi are allowed; see prefix. fweights are allowed; see weight. See [R] exlogistic postestimation for features available after estimation.


Statistics > Exact statistics > Exact logistic regression


exlogistic fits an exact logistic regression model, which produces more accurate inference in small samples than the standard maximum-likelihood-based logistic regression estimator. It can also better deal with completely determined outcomes. exlogistic with the group(varname) option conditions on the number of positive outcomes within stratum and is an alternative to the conditional (fixed-effects) logistic regression estimator.

Unlike Stata's other estimation commands, exlogistic must perform hypothesis tests during estimation rather than during postestimation with standard postestimation commands.


+-------+ ----+ Model +------------------------------------------------------------

condvars(varlist) specifies variables whose parameter estimates are not of interest to you. You can save substantial computer time and memory moving such variables from indepvars to condvars(). Understand that you will get the same results for x1 and x3 whether you type

. exlogistic y x1 x2 x3 x4


. exlogistic y x1 x3, condvars(x2 x4)

group(varname) specifies the variable defining the strata, if any. A constant term is assumed for each stratum identified in varname, and the sufficient statistics for indepvars are conditioned on the observed number of successes within each group. This makes the model estimated equivalent to that estimated by clogit, Stata's conditional logistic regression command (see [R] clogit). group() may not be specified with noconstant or estconstant.

binomial(varname|#) indicates that the data are in binomial form and depvar contains the number of successes. varname contains the number of trials for each observation. If all observations have the same number of trials, you can instead specify the number as an integer. The number of trials must be a positive integer at least as great as the number of successes. If binomial() is not specified, the data are assumed to be Bernoulli, meaning that depvar equaling zero or nonzero records one failure or success.

estconstant estimates the constant term. By default, the models are assumed to have an intercept (constant), but the value of the intercept is not calculated. That is, the conditional distribution of the sufficient statistics for the indepvars is computed given the number of successes in depvar, thus conditioning out the constant term of the model. Use estconstant if you want the estimate of the intercept reported. estconstant may not be specified with group().

noconstant; see [R] estimation options. noconstant may not be specified with group().

+-------+ ----+ Terms +------------------------------------------------------------

terms(termname = variable ... variable[, termname = variable ... variable ...]) defines additional terms of the model on which you want exlogistic to perform joint-significance hypothesis tests. By default, exlogistic reports tests individually on each variable in indepvars. For instance, if variables x1 and x3 are in indepvars, and you want to jointly test their significance, specify terms(t1=x1 x3). To also test the joint significance of x2 and x4, specify terms(t1=x1 x3, t2=x2 x4). Each variable can be assigned to only one term.

Joint tests are computed only for the conditional scores tests and the conditional probabilities tests. See test() below.

+---------+ ----+ Options +----------------------------------------------------------

memory(#[b|k|m|g]) sets a limit on the amount of memory exlogistic can use when computing the conditional distribution of the parameter sufficient statistics. The default is memory(10m), where m stands for megabyte, or 1,048,576 bytes. The following are also available: b stands for byte; k stands for kilobyte, which is equal to 1,024 bytes; and g stands for gigabyte, which is equal to 1,024 megabytes. The minimum setting allowed is 1m and the maximum is 2048m or 2g, but do not attempt to use more memory than is available on your computer. Also see the technical note on counting the conditional distribution.

saving(filename[, replace]) saves the joint conditional distribution to filename. This distribution is conditioned on those variables specified in condvars(). Use replace to replace an existing file with filename. A Stata data file is created containing all the feasible values of the parameter sufficient statistics. The variable names are the same as those in indepvars, in addition to a variable named _f_ containing the feasible value frequencies (sometimes referred to as the condition numbers).

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#); see [R] estimation options. The level(#) option will not work on replay because confidence intervals are based on estimator-specific enumerations. To change the confidence level, you must refit the model.

coef reports the estimated coefficients rather than odds ratios (exponentiated coefficients). coef may be specified when the model is fit or upon replay. coef affects only how results are displayed and not how they are estimated.

test(sufficient|score|probability) reports the p-value associated with the observed sufficient statistics, the conditional scores tests, or the conditional probabilities tests, respectively. The default is test(sufficient). If terms() is included in the specification, the conditional scores test and the conditional probabilities test are applied to each term providing conditional inference for several parameters simultaneously. All the statistics are computed at estimation time regardless of which is specified. Each statistic may thus also be displayed postestimation without having to refit the model; see [R] exlogistic postestimation.

mue(varlist) specifies that median unbiased estimates (MUEs) be reported for the variables in varlist. By default, the conditional maximum likelihood estimates (CMLEs) are reported, except for those parameters for which the CMLEs are infinite. Specify mue(_all) if you want MUEs for all the indepvars.

midp instructs exlogistic to use the mid-p-value rule when computing the MUEs, p-values, and confidence intervals. This adjustment is for the discreteness of the distribution and halves the value of the discrete probability of the observed statistic before adding it to the p-value. The mid-p-value rule cannot be applied to MUEs whose corresponding parameter CMLE is infinite.

nolog prevents the display of the enumeration log. By default, the enumeration log is displayed, showing the progress of computing the conditional distribution of the sufficient statistics.

Technical note

The memory(#) option limits the amount of memory that exlogistic will consume when computing the conditional distribution of the parameter sufficient statistics. memory() is independent of the data maximum memory setting (see set max_memory in [D] memory), and it is possible for exlogistic to exceed the memory limit specified in set max_memory without terminating. By default, a log is provided that displays the number of enumerations (the size of the conditional distribution) after processing each observation. Typically, you will see the number of enumerations increase, and then at some point they will decrease as the multivariate shift algorithm (Hirji, Mehta, and Patel 1987) determines that some of the enumerations cannot achieve the observed sufficient statistics of the conditioning variables. When the algorithm is complete, however, it is necessary to store the conditional distribution of the parameter sufficient statistics as a dataset. It is possible, therefore, to get a memory error when the algorithm has completed if there is not enough memory to store the conditional distribution.


Setup . webuse hiv1

Perform exact logistic regression of hiv on cd4 and cd8 . exlogistic hiv cd4 cd8

Replay results, but report estimated coefficients rather than odds ratios . exlogistic, coef

Replay results and report conditional scores test . exlogistic, test(score)

Stored results

exlogistic stores the following in e():

Scalars e(N) number of observations e(k_groups) number of groups e(n_possible) number of distinct possible outcomes where sum(sufficient) equals observed e(sufficient) e(n_trials) binomial number-of-trials parameter e(sum_y) sum of depvar e(k_indvars) number of independent variables e(k_terms) number of model terms e(k_condvars) number of conditioning variables e(condcons) conditioned on the constant(s) indicator e(midp) mid-p-value rule indicator e(eps) relative difference tolerance

Macros e(cmd) exlogistic e(cmdline) command as typed e(title) title in estimation output e(depvar) name of dependent variable e(indvars) independent variables e(condvars) conditional variables e(groupvar) group variable e(binomial) binomial number-of-trials variable e(terms) term names set in option terms() e(level) confidence level e(wtype) weight type e(wexp) weight expression e(datasignature) the checksum e(datasignaturevars) variables used in calculation of checksum e(properties) b e(estat_cmd) program used to implement estat e(marginsnotok) predictions disallowed by margins

Matrices e(b) coefficient vector e(mue_indicators) indicator for elements of e(b) estimated using MUE instead of CMLE e(se) e(b) standard errors (CMLEs only) e(ci) matrix of e(level) confidence intervals for e(b) e(sum_y_groups) sum of e(depvar) for each group e(N_g) number of observations in each group e(sufficient) sufficient statistics for e(b) e(p_sufficient) p-value for e(sufficient) e(scoretest) conditional scores tests for indepvars e(p_scoretest) p-value for e(scoretest) e(probtest) conditional probabilities tests for indepvars e(p_probtest) p-value for e(probtest) e(scoretest_m) conditional scores tests for model terms e(p_scoretest_m) p-value for e(scoretest_m) e(probtest_m) conditional probabilities tests for model terms e(p_probtest_m) p-value for e(probtest_m)

Function e(sample) marks estimation sample


Hirji, K. F., C. R. Mehta, and N. R. Patel. 1987. Computing distributions for exact logistic regression. Journal of the American Statistical Association 82: 1110-1117.

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index