Stata 15 help for expoisson

[R] expoisson -- Exact Poisson regression


expoisson depvar indepvars [if] [in] [weight] [, options]

options Description ------------------------------------------------------------------------- Model condvars(varlist) condition on variables in varlist group(varname) groups/strata are stratified by unique values of varname exposure(varname_e) include ln(varname_e) in model with coefficient constrained to 1 offset(varname_o) include varname_o in model with coefficient constrained to 1

Options memory(#[b|k|m|g]) set limit on memory usage; default is memory(25m) saving(filename) save the joint conditional distribution to filename

Reporting level(#) set confidence level; default is level(95) irr report incidence-rate ratios test(testopt) report p-value for observed sufficient statistic, conditional scores test, or conditional probabilities test mue(varlist) compute the median unbiased estimates for varlist midp use the mid-p-value rule nolog do not display the enumeration log ------------------------------------------------------------------------- by, statsby, and xi are allowed; see prefix. fweights are allowed; see weight. See [R] expoisson postestimation for features available after estimation.


Statistics > Exact statistics > Exact Poisson regression


expoisson fits an exact Poisson regression model, which produces more accurate inference in small samples than standard maximum-likelihood-based Poisson regression. For stratified data, expoisson conditions on the number of events in each stratum and is an alternative to fixed-effects Poisson regression.


+-------+ ----+ Model +------------------------------------------------------------

condvars(varlist) specifies variables whose parameter estimates are not of interest to you. You can save substantial computer time and memory by moving such variables from indepvars to condvars(). Understand that you will get the same results for x1 and x3 whether you type

. expoisson y x1 x2 x3 x4


. expoisson y x1 x3, condvars(x2 x4)

group(varname) specifies the variable defining the strata, if any. A constant term is assumed for each stratum identified in varname, and the sufficient statistics for indepvars are conditioned on the observed number of successes within each group (as well as other variables in the model). The group variable must be integer valued.

exposure(varname_e), offset(varname_o); see [R] estimation options.

+---------+ ----+ Options +----------------------------------------------------------

memory(#[b|k|m|g]) sets a limit on the amount of memory expoisson can use when computing the conditional distribution of the parameter sufficient statistics. The default is memory(25m), where m stands for megabyte, or 1,048,576 bytes. The following are also available: b stands for byte; k stands for kilobyte, which is equal to 1,024 bytes; and g stands for gigabyte, which is equal to 1,024 megabytes. The minimum setting allowed is 1m and the maximum is 2048m or 2g, but do not attempt to use more memory than is available on your computer. Also see the technical note on counting the conditional distribution.

saving(filename[, replace]) saves the joint conditional distribution for each independent variable specified in indepvars. There is one file for each variable, and it is named using the prefix filename with the variable name appended. For example, saving(mydata) with an independent variable named X would generate a data file named mydata_X.dta. Use replace to replace an existing file. Each file contains the conditional distribution for one of the independent variables specified in indepvars conditioned on all other indepvars and those variables specified in condvars(). There are two variables in each data file: the feasible sufficient statistics for the variable's parameter and their associated weights. The weights variable is named _w_.

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#); see [R] estimation options. The level(#) option will not work on replay because confidence intervals are based on estimator-specific enumerations. To change the confidence level, you must refit the model.

irr reports estimated coefficients transformed to incidence-rate ratios, that is, exp(b) rather than b. Standard errors and confidence intervals are similarly transformed. This option affects how results are displayed, not how they are estimated or stored. irr may be specified at estimation or when replaying previously estimated results.

test(sufficient|score|probability) reports the p-value associated with the observed sufficient statistic, the conditional scores test, or the conditional probabilities test. The default is test(sufficient). All the statistics are computed at estimation time, and each statistic may be displayed postestimation; see [R] expoisson postestimation.

mue(varlist) specifies that median unbiased estimates (MUEs) be reported for the variables in varlist. By default, the conditional maximum likelihood estimates (CMLEs) are reported, except for those parameters for which the CMLEs are infinite. Specify mue(_all) if you want MUEs for all the indepvars.

midp instructs expoisson to use the mid-p-value rule when computing the MUEs, p-values, and confidence intervals. This adjustment is for the discreteness of the distribution by halving the value of the discrete probability of the observed statistic before adding it to the p-value. The mid-p-value rule cannot be applied to MUEs whose corresponding parameter CMLE is infinite.

nolog prevents the display of the enumeration log. By default, the enumeration log is displayed, showing the progress of computing the conditional distribution of the sufficient statistics.

Technical note

The memory(#) option limits the amount of memory that expoisson will consume when computing the conditional distribution of the parameter sufficient statistics. memory() is independent of the data maximum memory setting (see set max_memory in [D] memory), and it is possible for expoisson to exceed the memory limit specified in set max_memory without terminating. By default, a log is provided that displays the number of enumerations (the size of the conditional distribution) after processing each observation. Typically, you will see the number of enumerations increase, and then at some point they will decrease as the multivariate shift algorithm (Hirji, Mehta, and Patel 1987) determines that some of the enumerations cannot achieve the observed sufficient statistics of the conditioning variables. When the algorithm is complete, however, it is necessary to store the conditional distribution of the parameter sufficient statistics as a dataset. It is possible, therefore, to get a memory error when the algorithm has completed if there is not enough memory to store the conditional distribution.


Setup . webuse smokes

Perform exact Poisson regression of cases on smokes using exposure peryrs . expoisson cases smokes, exposure(peryrs) irr

Replay results and report conditional scores test . expoisson, test(score) irr

Stored results

expoisson stores the following in e():

Scalars e(N) number of observations e(k_groups) number of groups e(relative_weight) relative weight for the observed e(sufficient) and e(condvars) e(sum_y) sum of depvar e(k_indvars) number of independent variables e(k_condvars) number of conditioning variables e(midp) mid-p-value rule indicator e(eps) relative difference tolerance

Macros e(cmd) expoisson e(cmdline) command as typed e(title) title in estimation output e(depvar) name of dependent variable e(indvars) independent variables e(condvars) conditional variables e(groupvar) group variable e(exposure) exposure variable e(offset) linear offset variable e(level) confidence level e(wtype) weight type e(wexp) weight expression e(datasignature) the checksum e(datasignaturevars) variables used in calculation of checksum e(properties) b V e(estat_cmd) program used to implement estat e(marginsnotok) predictions disallowed by margins

Matrices e(b) coefficient vector e(mue_indicators) indicator for elements of e(b) estimated using MUE instead of CMLE e(se) e(b) standard errors (CMLEs only) e(ci) matrix of e(level) confidence intervals for e(b) e(sum_y_groups) sum of e(depvar) for each group e(N_g) number of observations in each group e(sufficient) sufficient statistics for e(b) e(p_sufficient) p-value for e(sufficient) e(scoretest) conditional scores tests for indepvars e(p_scoretest) p-value for e(scoretest) e(probtest) conditional probability tests for indepvars e(p_probtest) p-value for e(probtest)

Function e(sample) marks estimation sample


Hirji, K. F., C. R. Mehta, and N. R. Patel. 1987. Computing distributions for exact logistic regression. Journal of the American Statistical Association 82: 1110-1117.

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index