Home  /  Products  /  Stata 9  /  More statistics

This page contains only historical information and is not about the current release of Stata. Please see our Stata 18 page for information on the current version of Stata.


More statistics

In addition to mixed models, survey statistics, multivariate statistics, and multinomial probit, many other new estimators and a host of statistical features have been added in Stata 9.


Also see the separate sections on multinomial mixed models, survey statistics, multivariate statistics, and multinomial probit

General-purpose statistics

  • New estimation command slogit fits the stereotype logistic regression model for categorical dependent variables. This model can be viewed as either a generalization of the multinomial logistic regression model (mlogit) or a generalization of the ordered logistic regression model (ologit) that relaxes the proportional–odds assumption. See [R] slogit.

    Predicted statistics after slogit include the linear predictor, the probability of any or all outcomes, and the standard error of the linear predictor. See [R] slogit postestimation.

  • New estimation command ivprobit fits probit regression models of binary outcomes with endogenous regressors. Estimation can be performed by maximum likelihood estimation (MLE) or by Newey’s minimum chi-squared two-step estimation, but note that some postestimation facilities, such as computing marginal effects with mfx, are available only after ML estimation—the two-step estimator imposes a transformation that invalidates many postestimation results. See [R] ivprobit.

  • New estimation command ivtobit fits linear regression models with censored dependent variables by maximum likelihood estimation or by Newey’s minimum chi-squared two-step estimation (but see the note about the the two-step estimator above). See [R] ivtobit.

  • New estimation command ztp fits a zero-truncated Poisson model of event counts with truncation at zero.

    Predicted statistics after ztp include the linear predictor and its standard error, the predicted number of events, the incidence rate, the conditional mean, and the likelihood score. See [R] ztp and [R] ztp postestimation.

  • New estimation command ztnb fits a zero-truncated negative binomial model of event counts with truncation at zero and over- or underdispersion.

    Predicted statistics after ztnb include the linear predictor and its standard error, the predicted number of events, the incidence rate, the conditional mean, and the likelihood scores. See [R] ztnb and [R] ztnb postestimation.

  • New estimation commands mean, ratio, proportion, and total estimate means, ratios, proportions, and totals over the entire sample or over groups within the sample. When estimating over groups, the entire covariance matrix (VCE) is estimated. These are full estimation commands and support a range of postestimation facilities, such as linear and nonlinear tests among the groups test and testnl and linear and nonlinear combinations of group-level statistics lincom and nlcom. All four commands support several SE and VCE estimates: robust, cluster-robust, bootstrap, jackknife, and observed information matrix (the default).

    mean, ratio, and proportion also support direct standardization across strata (groups) using the stdize() and stdweight() options.

    See [R] mean, [R] ratio, [R] proportion, and [R] total.

  • To avoid conflict with the new mean command, existing command means has been renamed ameans, with synonyms gmeans and hmeans.

  • Existing command nl has a new syntax that makes estimating nonlinear least-squares regressions easier. For most models, estimation is now as easy as typing the nonlinear expression. Full programmability has been retained for complex models, and the old syntax continues to work.

    nl also now supports robust (white/sandwich) and cluster-robust SE and VCE estimates, including two popular adjustments that can dramatically improve the small-sample performance of robust SE and VCE estimates.

    A number of new reporting and estimation options have also been added. See [R] nl.

  • New option vce() selects how standard errors (SEs) and covariance matrix of the estimated parameters are estimated by most estimation commands. Choices are vce(oim), vce(opg), vce(robust), vce(jackknife), and vce(bootstrap), although the choices can vary estimator by estimator. vce(robust) is a synonym for robust, and you can use either. What is new are vce(jackknife) and vce(bootstrap).

    vce(bootstrap) specifies that the standard errors, significance tests, and confidence intervals be normal-based bootstrap estimates, rather than the default analytic estimates based on the observed information matrix. You can also produce percentile-based or bias-corrected confidence intervals after estimation using estat bootstrap; see [R] bootstrap postestimation.

    vce(jackknife) specifies that the standard errors, significance tests, and confidence intervals be jackknife estimates.

    Both vce(bootstrap) and vce(jackknife) will automatically perform either observation or cluster sampling, whichever is appropriate for the estimator.

    Notably, both vce(bootstrap) and vce(jackknife) compute bootstrapped or jackknifed estimates of the complete VCE matrix. This means that many of Stata’s postestimation commands are available. You can form linear and nonlinear combinations or functions of the parameters and obtain jackknife or normal-based bootstrap standard errors and confidence intervals for the combinations using [R] lincom and [R] nlcom. Similarly, you can perform linear and nonlinear tests using [R] test and [R] testnl.

  • New command estat centralizes the computing and reporting of additional statistics after estimation just as predict does with predictions. estat allows subcommands. estat summarize, for instance, reports summary statistics for the estimation sample and can be used after any estimator. estat also allows subcommands that are specific to the estimation command. To find out what is available after a command, see the corresponding postestimation entry. For example, after [R] regress, see [R] regress postestimation, or after [XT] xtmixed, see [XT] xtmixed postestimation.

    Existing postestimation commands have been brought into the estat framework:
    Estimation command Old command New estat command
    regress ovtest estat ovtest
      hettest estat hettest
      szroeter estat szroeter
      vif estat vif
      imtest estat imtest
    regress dwstat estat dwatson
    (time series) durbina estat durbinalt
      bgodfrey estat bgodfrey
      archlm estat archlm
    anova ovtest estat ovtest
      hettest estat hettest
    logit and lstat estat classification(*)
    logistic lfit estat gof(*)
    poisson poisgof estat gof
    stcox stphtest estat phtest
    xtgee xtcorr estat wcorrelation
    (*) The new command works after probit, as well as after logit and logistic; the old command worked after logit and logistic only.
    The original commands continue to work, but are undocumented.

    Three estat subcommands are available after almost all estimators:

    • estat ic reports Akaike’s and Schwarz’s Bayesian information criteria (AIC and BIC).

    • estat summarize reports summary statistics on the variables in the estimation model for the estimation sample.

    • estat vce reports the covariance (VCE) or correlation matrix estimates. (estat vce replace the old vce command, and has more features.)

  • Stata has many new prefix commands (commands that behave like by: and xi:). New prefix commands include statsby:, bootstrap:, jackknife:, permute:, simulate:, stepwise:, svy:, and rolling:. For instance, to obtain the standard error and confidence interval of the mean, you might type
    		. jackknife: mean earnings
    or to obtain survey-adjusted estimates you might type
    		. svy:  mean earnings
    after svysetting your data.

    See [R] bootstrap, [R] jackknife, [R] permute, [TS] rolling [R] simulate, [R] stepwise, [D] statsby, and [SVY] svy.

  • New prefix commands bootstrap: and jackknife: replace old commands bs and jknife, and in addition to having better syntax, they also provide new features:

    • They have enhanced handling and reporting of expressions.

    • They post their results as estimation results with a complete VCE. Most postestimation facilities may now be used after them and will be based on the bootstrap or jackknife VCE. These include

      adjust adjusted predictions
      estimates cataloging estimation results
      lincom linear combinations with SEs, tests, and CIs
      nlcom nonlinear combinations with SEs, tests, and CIs
      mfx computing marginal effects and elasticities
      predict predictions, residuals, probabilities, etc.
      predictnl generalized nonlinear predictions with SEs and CIs
      test Wald tests of simple and composite linear hypotheses
      testnl Wald tests of nonlinear hypotheses

    • They produce a model test when applied to the coefficients of estimation commands.

    • They allow option seed(#) to set the random-number seed.

    • They allow option reject(exp) to reject replicates that explicitly match exp.

    • bootstrap: uses the normal distribution instead of the Student’s t distribution to compute the normal approximation confidence intervals.

    • jackknife: now allows fweights to be specified.

    See [R] bootstrap and [R] jackknife.

  • New prefix command statsby: replaces old command statsby (not a prefix) and provides enhanced handling and reporting of expressions, allows weights, and allows string variables in the option by(). See [D] statsby.

  • New prefix command stepwise: replaces old command sw and, in addition to working with all the previous estimators, also works with [R] intreg and [R] scobit.

  • Existing prefix command xi: has new option noomit that prevents it from omitting a category when generating category indicators for group variables. See [R] xi.

  • New command tetrachoric computes a tetrachoric correlation matrix for a set of binary variables. See [R] tetrachoric.

  • Existing command suest, which combines estimation results for subsequent testing, is easier to use and has new features:

    • Scores are now computed for the models you combine; you no longer need to save scores when estimating.

    • suest, used after svy: estimation, now accounts for your survey design.

    • suest now works more smoothly with certain estimation commands that previously required special treatment, including regress, ologit, and oprobit.

    • suest now works with all models estimated by clogit, rather than only those with a single positive outcome per group.

    See [R] suest.

  • Existing command clogit has new features:

    • Robust and cluster-robust SE and VCE estimates are now supported via options robust and cluster().

    • Linear constraints on the parameters are now implemented via option constraints().

    • New option vce() allows SE and VCE estimates to be computed using OIM (the default), OPG, bootstrap, and jackknife.

    See [R] clogit.

  • Option level() now allows noninteger confidence levels to be specified. See [R] level.

  • Existing command predict now generates equation-level scores after most maximum likelihood estimation commands; see the documentation of predict in the postestimation entry for each estimation command.

  • Existing command cumul has a new option equal to create equal cumulative values for ties. See [R] cumul.

  • Existing command estimates table now allows you to specify more models, and the command wraps the table if necessary. Also allowed are new options

    • equations(), which matches equations by number rather than by name.

    • coded, which displays the table in a compact, symbolic format.

    • modelwidth(), which sets the number of characters for displaying model names.

    See [R] estimates.

  • test after anova and manova has two new options for performing Wald tests:

    • mtest() implements three methods to adjust for multiple tests: Bonferroni, Holm, and Šidák.

    • test() makes specifying contrasts easier by accepting a matrix containing the contrast.

    See [R] anova postestimation.

  • Commands ci and cii have new options exact, wilson, agresti, jeffreys, and wald for computing different types of binomial confidence intervals. See [R] ci.

  • Command hausman has new option df() for controlling the degrees of freedom. See [R] hausman.

  • Command predict has new option score for returning equation-level scores. See [R] predict.

  • Command mfx is now faster and has new option varlist() for computing effects of specific variables. See [R] mfx.

  • Command mfp has the new option aic for selecting models using the Akaike information criterion (AIC). See [R] mfp.

  • Commands tabulate and tabi with the exact option are now significantly faster.

  • In existing command mlogit, option basecat has been renamed baseoutcome() for better consistency with the terminology of choice models. See [R] mlogit.

  • Existing commands spearman and ktau now allow more than two variables to be specified and have more flexible output. See [R] spearman.

  • Existing command bsample for sampling with replacement (bootstrap sampling) now supports weighted bootstrap resampling using the new weight() option. See [R] bsample.

  • Existing command bstat for reporting bootstrap results has a number of new reporting options. In addition, bstat previously computed percentile and other confidence intervals. This is now handled by estat bootstrap, used after any bootstrap estimation, including bstat. See [R] bstat and [R] bootstrap postestimation.

  • Most maximum likelihood estimators now test for convergence using the Hessian-scaled gradient, g*inv(H)*g'. This criterion ensures that the gradient is close to zero when scaled by the Hessian (the curvature of the likelihood or pseudolikelihood surface at the optimum) and provides greater assurance of convergence for models whose likelihoods tend to be difficult to optimize, such as those for arch, asmprobit, and scobit. You can set the tolerance level for this test with new option nrtolerance(), show the Hessian-scaled gradient in the iteration log with option shownrtol, and turn the test off with option nonrtolerance. See [R] maximize.

  • Existing command set has new setting maxiter—default value 16000—that specifies the maximum number of iterations to be performed by all estimation commands. You change this setting by typing set maxiter #, and you may add option permanently to retain the setting in future Stata sessions.

Time-series statistics

  • Existing command arima can now estimate multiplicative seasonal ARIMA (SARIMA) models; see new options sarima(), mar(), and mma() in [TS] arima.

  • New command rolling performs rolling-window or recursive estimations, including regressions, and collects statistics from the estimation on each window; see [TS] rolling.

  • The [TS] manual now has a glossary that defines commonly used terms in time-series analysis and explains how we use them in the manual; see the glossary of [TS].

  • Many existing commands that previously did not allow time-series operators now do. These commands include areg, binreg, biprobit, boxcox, cloglog, cnsreg, glm, heckman, heckprob, hetprob, impute, intreg, logistic, logit, lowess, mvreg, nbreg, orthog, pcorr, poisson, probit, pwcorr, rreg, testparm, treatreg, truncreg, xtcloglog, xtgls, xtintreg, xtlogit, xtpoisson, xtprobit, xtgee, xtreg, xtsum, and xttobit.

  • Many commands requiring time-series data now work on a single panel from a panel dataset when that panel is selected using an if expression or an in qualifier. Those commands include ac, corrgram, cumsp, dfgls, dfuller, pac, pergram, pperron, wntestb, wntestq, and xcorr. New commands estat archlm, estat bgodfrey, estat dwatson, and estat durbinalt, which replace commands archlm, bgodfrey, dwstat, and durbina, also work on a single panel from a panel dataset.

  • The dialogs for analyzing IRF results are much improved. The dialogs now populate lists of models and variables from the current IRF results that may be chosen for producing tables and graphs. The improved dialogs include irf cgraph, irf ctable, irf graph, irf ograph, and irf table.

  • Existing command dfuller has new option drift for testing the null hypothesis of a random walk with drift. The algorithm for calculating MacKinnon’s approximate p-values is also now more accurate in cases where the p-value is relatively large; see [TS] dfuller.

  • Existing commands corrgram and pac have new option yw that computes partial autocorrelations using the Yule–Walker equations instead of the default regression-based method; see [TS] corrgram.

  • Time-series operators are now better displayed in estimation and other result tables.

  • New command estat durbinalt—used after regress—brings together what was previously done by commands dwstat, durbina, bgodfrey, and archlm. The new commands are estat dwatson, estat durbina, estat bgodfrey, and estat archlm. See [R] regress postestimation time series.

  • The ability of arima and arch to estimate standard errors using either the observed information matrix (OIM) or the outer product of gradients (OPG) has been consolidated under the new vce() option.

    (What follows was first released in Stata 8.2.)

  • New command vec fits cointegrated vector error-correction models (VECMs) using Johansen’s method; see [TS] vec.

  • New command vecrank produces statistics used to determine the number of cointegrating vectors in a VECM, including Johansen’s trace and maximum-eigenvalue tests for cointegration; see [TS] vecrank.

  • New command fcast—which replaces old command varfcast—produces and graphs dynamic forecasts of the dependent variables after fitting a VAR, SVAR, or VECM; see [TS] fcast.

  • New command irf—which replaces the old command varirf—does everything the old command did and more. irf estimates the impulse–response functions, cumulative impulse–response functions, orthogonalized impulse–response functions, structural impulse–response functions, and forecast error-variance decompositions after fitting a VAR, SVAR, or VECM. irf can also make graphs and tables of the results. See [TS] irf.

    varirf continues to work but is no longer documented. irf accepts .vrf result files created by varirf.

  • Existing command varsoc can now be used to obtain lag-order selection statistics for VECMs, as well as VARs; see [TS] varsoc.

  • New command veclmar computes Lagrange-multiplier statistics for autocorrelation after fitting a VECM; see [TS] veclmar.

  • New command vecnorm tests whether the disturbances in a VECM are normally distributed. For each equation and for all equations jointly, three statistics are computed: a skewness statistic, a kurtosis statistic, and the Jarque–Bera statistic. See [TS] vecnorm.

  • New command vecstable checks the eigenvalue stability condition after fitting a VECM; see [TS] vecstable.

  • New command vecstable and the existing command varstable have a new graph option for presenting the stability results. See [TS] vecstable and [TS] varstable.

  • The output of the following commands has been standardized to improve formatting: var, svar, vargranger, varlmar, varnorm, varsoc, varstable, and varwle.

  • New command haver makes it easy to load and analyze economic and financial databases available from Haver Analytics; see [TS] haver.

Longitudinal/panel statistics

  • The big news is the new commands xtmixed—Stata now fits linear mixed models. See the section on linear mixed models.

  • New features have been added to the maximum likelihood estimators that do not have closed-form solutions and require numeric evaluation of the likelihood. These estimators include xtlogit, xtprobit, xtpoisson, xtcloglog, xtintreg, and xttobit.

    • The likelihood may now be approximated using adaptive Gauss–Hermite quadrature (the new default) or nonadaptive quadrature (the previous default). Adaptive quadrature substantially increases the accuracy of the approximation, particularly on difficult problems such as data with large panel sizes or data with a large variance for the random effects.

    • Linear constraints may now be imposed using the new option constraints(). Constraints are specified the standard way; see [R] constraint.

    • New option intpoints() replaces old option quad(), although quad() continues to work. The new name is more meaningful, especially when used with estimators that integrate likelihoods using methods other than quadrature.

  • Existing command xtreg now allows options robust and cluster() when estimating fixed-effects (FE) and random-effects (RE) models; see [XT] xtreg.

  • Most [XT] commands that previously did not allow time-series operators now support them. These commands include xtgls, xtreg, xtsum, xtcloglog, xtintreg, xtlogit, xtpoisson, xtprobit, xttobit, and xtgee.

  • New command xtrc is old command xtrchh, renamed, and with new features. New option beta reports the best linear predictors (BLUPs) for the group-specific coefficients, along with their standard errors and confidence intervals. For details, see [XT] xtrc.

  • predict after xtrc has the new option group() to compute the BLUPs of the dependent variable using the BLUPs of the coefficients.

  • New command xtline plots panel data and allows either overlaid or separate graphs for each panel; see [XT] xtline.

  • New section [XT] glossary defines commonly used terms and how they are used by us.

Survival analysis

  • The [ST] manual now has a glossary that defines commonly used terms in survival (or duration) analysis and often explains how these terms are used in the manual; see the glossary of [ST].

  • New command estat can be used after stcox and streg. In addition to the standard estat statistics—information criteria, estimation sample summary, and formatted variance–covariance matrix (VCE)—statistics specific to the proportional-hazards estimator are available after stcox. These include

    • estat concordance computes Harrell’s C and Somers' D statistics measuring concordance—agreement of predictions with observed failure order.

    • estat phtest replaces the existing stphtest for computing tests and graphs of the proportional hazards assumption. stphtest continues to work.

    See [ST] stcox postestimation and [ST] streg postestimation.

  • Existing command sts graph has new options cihazard and per(#). cihazard draws pointwise confidence bands around the smoothed hazard function, and per() specifies the units used to report the survival or failure rate. See [ST] sts.

  • Existing command stcurve now plots over an evenly spaced grid, producing smooth curves, even in small samples; see [ST] stcurve.

  • Existing command sts graph has new options atriskopts() and lostopts() that let you control how the labels for at-risk and lost observations look (their color, font size, etc.); see [ST] sts.

  • Existing command stci has new options for controlling how the plotted survival line looks (color, thickness, etc.) and for adding titles, controlling legends, and all other characteristics of the graph; see [ST] stci.

New ML features

Command ml, for implementing user-written maximum likelihood estimators, has many new features:
  • New option technique() sets the optimization technique. BHHH, DFP, and BFGS optimization techniques are now available; the default technique remains modified Newton–Raphson.

  • New option vce() sets the type of covariance matrix calculations that will be made.

    vce(oim) specifies the observed information matrix (OIM), also called the Hessian-based estimator; this is (and always has been) the default.

    vce(opg) specifies the outer product of the gradients (OPG). This is new.

    vce(robust) specifies Taylor-series linearization, also known as the Huber or White estimator and, in Stata, as simply robust.

  • Most estimators written with ml now support estimation with survey data and correlated data with no additional programming. This support includes correct treatment of multistage designs, weighting, stratification, poststratification, and finite-population corrections, as well as access to linearization, jackknife, and bootstrap variance estimators. For a discussion, see [P] program properties.

  • ml has always allowed linear constraints to be applied using the option constraints() with no additional programming. It now handles irrelevant constraints more elegantly. Irrelevant constraints are those that have no impact on the model. Previously, irrelevant constraints caused an error message. Now they are flagged and ignored.

  • When linear constraints are imposed, ml now applies a Wald test for the overall fit of the model, rather than attempting a likelihood-ratio (LR) test, which is often inappropriate.

  • ml has new subcommand score for generating scores after fitting a model.

  • ml has new option diparm_options() that automatically performs transformations of ancillary parameters.

  • ml now saves the gradient vector in e(gradient).

  • ml has new option search(norescale) that prevents rescaling when searching for starting values.

  • ml honors the new setting for maximum iterations, set maxiter #, and will iterate a maximum of # iterations, even if convergence has not been achieved.

  • ml now displays a prominent message in the footer of the estimation results when convergence is not achieved. This message continues to be shown on redisplay of estimation results.

  • ml has new option nofootnote to suppress printing the new message warning if convergence is not achieved.

  • ml tests for convergence using the Hessian-scaled gradient—g*inv(H)*g'. This is a true convergence criterion that ensures that the gradient is close to zero when scaled by the Hessian (the curvature of the likelihood or pseudolikelihood surface at the optimum). This new criterion is particularly important when maximizing difficult likelihoods to prevent stopping the maximization too soon.

  • New option nrtolerance() lets you change the tolerance for the Hessian-scaled gradient convergence criterion; the default is nrtolerance(1e-5).

  • New option shownrtolerance displays the criterion value of the Hessian-scaled gradient at each iteration.

  • New undocumented command mlmatbysum helps you compute the Hessian of panel-data likelihoods and is of interest to those seeking the speed that comes with programming your own second-derivative calculations; see mlmatbysum.

  • ml has two new undocumented subcommands—ml hold and ml unhold—to assist in solving nested optimization problems; see ml_hold.

    See [R] ml for more information on these features. Anyone programming estimators using ml should read the book Maximum Likelihood Estimation with Stata. Many of the features mentioned above are discussed and applied to real problems in the book.