Logistic regression
Stata supports all aspects of logistic regression through the
following commands:
| asclogit |
Alternative-specific conditional logit regression
|
| asmprobit |
Alternative-specific multinomial probit regression
|
| asroprobit |
Alternative-specific rank-ordered probit regression
|
| binreg | GLM models for the binomial family |
| biprobit | Bivariate probit regression |
| blogit | Logit regression for grouped data |
| bprobit | Probit regression for grouped data |
| clogit | Conditional (fixed-effects) logistic regression |
| cloglog | Complementary log-log regression |
| exlogistic | Exact logistic regression |
| glm | Generalized linear models |
| glogit | Weighted least-squares logistic regression for grouped data |
| gprobit | Weighted least-squares probit regression for grouped data |
| heckprob | Probit model with selection |
| hetprob | Heteroskedastic probit model |
| ivprobit | Probit model with endogenous regressors |
| logit | Logistic regression, reporting coefficients |
| mlogit | Multinomial (polytomous) logistic regression |
| |
| mprobit | Multinomial probit regression |
| nlogit | Nested logit regression |
| ologit |
Ordered logistic regression |
| oprobit | Ordered probit regression |
| probit | Probit regression |
| rologit | Rank-ordered logistic regression |
| scobit | Skewed-logistic regression |
| slogit | Stereotype logistic regression |
| svy: heckprob | Survey version of heckprob |
| svy: logistic | Survey version of logistic |
| svy: logit | Survey version of logit |
| svy: mlogit | Survey version of mlogit |
| svy: ologit | Survey version of ologit |
| svy: oprobit | Survey version of oprobit |
| svy: probit | Survey version of probit |
| xtcloglog | Random-effects and population-averaged cloglog models |
| xtgee | GEE population-averaged generalized linear models |
| xtlogit | Fixed-effects, random-effects, and population-averaged logit models |
| xtprobit | Random-effects and population-averaged probit models |
|
Stata’s logistic command fits maximum-likelihood dichotomous
logistic models:
. webuse lbw
(Hosmer & Lemeshow data)
. logistic low age lwt i.race smoke ptl ht ui
Logistic regression Number of obs = 189
LR chi2(8) = 33.22
Prob > chi2 = 0.0001
Log likelihood = -100.724 Pseudo R2 = 0.1416
low | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] |
age | .9732636 .0354759 -0.74 0.457 .9061578 1.045339 |
lwt | .9849634 .0068217 -2.19 0.029 .9716834 .9984249 |
| |
race | |
2 | 3.534767 1.860737 2.40 0.016 1.259736 9.918406 |
3 | 2.368079 1.039949 1.96 0.050 1.001356 5.600207 |
| |
smoke | 2.517698 1.00916 2.30 0.021 1.147676 5.523162 |
ptl | 1.719161 .5952579 1.56 0.118 .8721455 3.388787 |
ht | 6.249602 4.322408 2.65 0.008 1.611152 24.24199 |
ui | 2.1351 .9808153 1.65 0.099 .8677528 5.2534 |
_cons | 1.586014 1.910496 0.38 0.702 .1496092 16.8134 |
The syntax of all estimation commands is the same: the name of the
dependent variable is followed by the names of the independent variables.
In this case, the dependent variable low (containing 1 if a newborn had a
birthweight of less than 2500 grams and 0 otherwise) was modeled as a
function of a number of explanatory variables. By default, logistic
reports odds ratios; the logit command alternative will report
coefficients if you prefer.
Once a model has been fitted, you can use Stata's predict command to
obtain the predicted probabilities of a positive outcome, the value of the
logit index, or the standard error of the logit index. You can also obtain
Pearson residuals, standardized Pearson residuals, leverage (the diagonal
elements of the hat matrix), Delta chi-square, Delta D, and Pregibon's Delta
beta influence measures by typing a single command. All statistics are
adjusted for the number of covariate patterns in the data—m-asymptotic
rather than n-asymptotic in Hosmer and Lemeshow (2000) jargon. Every
diagnostic graph suggested by Hosmer and Lemeshow can be drawn by typing one
or two commands:
Also available are the goodness-of-fit test, using either cells defined by
the covariate patterns or grouping, as suggested by Hosmer and Lemeshow;
classification statistics and the classification table; and a graph and area
under the ROC curve.
Stata’s mlogit command performs maximum likelihood
estimation of models with discrete dependent variables. It is intended for
use when the dependent variable takes on more than two outcomes and the
outcomes have no natural ordering. Uniquely, linear constraints on the
coefficients can be specified both within and across equations using
algebraic syntax. Much thought has gone into making mlogit truly
usable. For instance, there are no artificial constraints placed on the
nature of the dependent variable. The dependent variable is not required to
take on integral, contiguous values such as 1, 2, and 3, although such a
coding would be acceptable. Equally acceptable would be 1, 3, and 4, or
even 1.2, 3.7, and 4.8.
Stata’s clogit command performs maximum likelihood estimation
with a dichotomous dependent variable; conditional logistic analysis differs
from regular logistic regression in that the data are stratified and the
likelihoods are computed relative to each stratum. The form of the
likelihood function is similar but not identical to that of multinomial
logistic regression. Conditional logistic analysis is known in epidemiology
circles as the matched case–control model and in econometrics as
McFadden's choice model. The form of the data, as well as the nature of the
sampling, differs across the two settings, but clogit handles both.
clogit allows both 1:1 and 1:k matching, and there may even be more
than one positive outcome per strata (which is handled using the exact
solution).
Stata’s ologit command performs maximum likelihood estimation
to fit models with an ordinal dependent variable, meaning a variable that is
categorical and in which the categories can be ordered from low to high,
such as “poor”, “good”, and “excellent”.
Unlike mlogit, ologit can exploit the ordering in the
estimation process. (Stata also provides an oprobit command for
fitting ordered probit models.) As with mlogit the categorical
dependent variable may take on any values whatsoever.
See Greene (2012)
for a straightforward description of the models fitted by clogit,
mlogit, ologit, and oprobit.
See
New in Stata 12
for more about what was added in Stata Release 12.
References
- Breslow, N. E. 1974.
-
Covariance analysis of censored survival data.
Biometrics 30: 89–99.
- Greene, W. H. 2012.
-
Econometric Analysis.
7th ed. Upper Saddle River, NJ: Prentice Hall.
- Hosmer, D. W. Jr. and S. Lemeshow. 2000.
-
Applied Logistic Regression. 2d ed. New York: Wiley.
- McFadden, D. 1974.
-
Conditional logit analysis of qualitative choice behavior.
In Frontiers in Econometrics, ed. P. Zarembka, 105–142.
New York: Academic Press.
|
Stata 12
Overview: Why use Stata?
Stata/MP
Capabilities
New in Stata 12
Supported platforms
Which Stata?
Technical support
User comments
|