Stata News and Announcements Statistics

This page contains only historical information and is not about the current release of Stata. Please see our features page for information on the current version of Stata.

Order Stata 7 Upgrade to Stata 7 Call to order or upgrade

Panel data/Cross-sectional time-series analysis (xt)
  xtabond produces the Arellano–Bond one-step, one-step robust, and two-step estimators for dynamic panel-data models, models in which there are lagged dependent variables. xtabond can be used with exogenously unbalanced panels and, uniquely, handles embedded gaps in the time series as well as opening and closing gaps. xtabond allows for predetermined covariates. xtabond allows you to use either the full instrument matrix or a pared down version. xtabond reports both the Sargan and autocorrelation tests derived by Arellano and Bond.


xtregar estimates cross-sectional time-series models in which eit is assumed to follow an AR(1) process. xtregar reports the within estimator and a GLS random-effects estimator. xtregar can handle unequally spaced observations and exogenously unbalanced panels. xtregar uniquely reports the modified Bhargava et al. Durbin–Watson statistic and the Baltagi–Wu locally best invariant test statistic for autocorrelation.


xtregiv estimates cross-sectional time-series regressions with (generalized) instrumental variables, or, said differently, estimates two-stage least squares time-series cross-sectional models. xtregiv can estimate such models using the between-2SLS estimator, the within-2SLS estimator, the first-differenced 2SLS estimator, the Balestra–Varadharajan–Krishnakumar G2SLS estimator, or the Baltagi EC2SLS estimator. All the estimators allow use of balanced or (exogenously) unbalanced panels.


xtpcse produces panel-corrected standard errors (PCSE) for linear cross-sectional time-series models where the parameters are estimated by OLS or Prais–Winsten regression. When computing the standard errors and the variance–covariance estimates, the disturbances are, by default, assumed to be heteroskedastic and contemporaneously correlated across panels.

Survival analysis (st)
  stcox will now estimate proportional hazard models with continuously time-varying covariates, and you do not need to modify your data to obtain the estimates.


streg can now estimate parametric survival models with individual-level frailty (unobserved heterogeneity). Two forms of the frailty distribution are allowed: gamma and inverse gaussian. Frailty is allowed with all the parametric distributions currently available. (New commands weibullhet, ereghet, etc., allow users to estimate these models outside of the st system.)

streg now supports estimation of stratified models, meaning that the distributional parameters (the ancillary parameters and intercept) are allowed to differ across strata.

streg has also been modified to allow you to specify any linear-in-the-parameters equation for any of the distributional parameters, which allows you to create various forms of stratification, as well as allowing distributional parameters to be linear functions of other covariates.


stptime calculates person-time (person-years) and incidence rates and implements computation of the standardized mortality/morbidity ratios (SMR).


sts test has been modified to include additional tests for comparing survivor distributions, including the Tarone–Ware test, the Fleming–Harrington test, and the Peto–Peto–Prentice test. Also new is a test for trend.


stci calculates and reports the level and confidence intervals of the survivor function, as well as computing and reporting the mean survival time and confidence interval.


stsplit is now much faster and now allows for splitting on failure times, as well as providing some additional convenience options. But remember that stcox can now estimate with continuous time-varying covariates without you having to stsplit the data beforehand.


stcurve has a new outfile option.

Cluster analysis

  cluster performs partitioning and hierarchical cluster analysis using a variety of methods. Two partitioning cluster methods are provided—kmeans and kmedians—and three hierarchical-cluster methods are provided—single linkage, average linkage, and complete linkage. Included are 14 binary similarity measures and 7 different continuous measures (counting things such as the Minkowski distance # as one).

More than one result can be saved simultaneously, so that the results of different analyses may be compared. Cluster membership and other clustering characteristics can be added to the dataset. cluster allows adding notes to analyses and, of course, the dropping of analyses. cluster also provides post-clustering commands that can, for instance, display the dendrogram (clustering tree) from a hierarchical analysis or produce new grouping variables based on the analysis.

cluster has been designed to be extended. Users may program extensions for new cluster methods, new cluster management routines, and new post-analysis summary methods.

Marginal effects

  mfx reports marginal effects after estimation of any model. Marginal effects refers to df()/dxi evaluated at x, where f() is any function of the data and the model's estimated parameters, x are the model's covariates, and xi is one of the covariates. For instance, the model might be probit and f() the cumulative normal distribution, in which case df()/dxi = the change in the probability of a positive outcome with respect to a change in one of the covariates. x might be specified as the mean, so that the change would be evaluated at the mean.

dprobit would already do that for the probit model, and there have been other commands published in the STB that would do this for other particular models, such as dtobit for performing tobit estimation.

mfx works after estimation of any model in Stata and is capable of producing marginal effects for anything predict can produce. For instance, after tobit, you could get the marginal effect of the probability of an outcome being uncensored, or the expected value of the uncensored outcome, or the expected value of the censored outcome.

mfx can compute results as derivatives or elasticities: df()/dxi, dlnf()/dlnxi, df()/dlnxi, or dlnf()/dxi.

Estimation commands (exclusive of st and xt)

  nlogit estimates nested logit models. In a nested logit model, multiple outcomes are grouped into a nested tree structure, and nested logit has the advantage over multinomial and conditional logistic models of allowing you to parameterize away the assumption of independence of the irrelevant alternatives (IIA).


glm has been rewritten and its facilities for estimating generalized linear models dramatically extended. It now offers an expanded choice of link functions and also allows user-specified link and variance functions. Newey-West or heteroskedastic autocorrelation consistent (HAC) covariance matrices and standard errors can be estimated. These include options for three standard HAC kernels as well as user-defined kernels. glm now reports maximum-likelihood based estimates of standard errors, IRLS based estimates, robust linearization based estimates, bootstrapped and jackknifed estimates, and several others.


Almost all of Stata's maximum likelihood estimators; including for example heckman, intreg, poisson, and streg; will now accept linear constraints. The constraints are defined using the constraint command and are specified to the estimator using the new constraint() options.


treatreg estimates the treatment effects model using either a two-step estimator or a full maximum-likelihood estimator. The treatment effects model considers the effect of an endogenously chosen binary treatment on another endogenous continuous variable, conditional on two sets of independent variables.


Heteroskedasticity and autocorrelation consistent (HAC) covariance matrices and standard error estimates can now be produced for probit, logit, Poisson, negative binomial, and many other cross-sectional maximum likelihood models. These variance estimates are accessed through the enhancements to glm discussed above.


boxcox has been rewritten. It now produces maximum likelihood estimates of the coefficients and the Box–Cox transform parameter(s). Box–Cox models may be estimated in various forms, with the transform on the left, on the right, or on both sides.


truncreg estimates truncated regression models. Truncated regression refers to regressions estimated on samples drawn based on the dependent variable, and therefore for which (sometimes) neither the dependent nor independent variables are observed (as opposed to tobit, which estimates regression models when the independent variables are observed in all cases).

Receiver Operating Characteristic (ROC) curves

  Five new commands are provided for the analysis of Receiver Operating Characteristic (ROC) curves.


roctab is used to perform nonparametric ROC analyses. By default, roctab calculates the area under the curve. Optionally, roctab can plot the ROC curve, display the data in tabular form, and produce Lorenz-like plots.


rocfit estimates maximum-likelihood ROC models assuming a binormal distribution of the latent variable. rocplot may be used after rocfit to plot the fitted ROC curve and simultaneous confidence bands.


roccomp tests the equality of two or more ROC areas obtained from applying two or more test modalities to the same sample or to independent samples.


rocgold independently tests the equality of the ROC area of each of several test modalities against a "gold" standard ROC curve. For each comparison, rocgold reports the raw and the Bonferroni adjusted significance probability. Optionally, Sidak's adjustment for multiple comparisons can be obtained.

More commands for epidemiologists

  binreg estimates generalized linear models for the binomial family and various links. It may be used with either individual-level or grouped data. Each link function offers a distinct, epidemiological interpretation of the estimated parameters: odds ratios (ORs), risk ratios (RRs), health ratios (HRs), and risk differences (RDs).


cc and cci now, by default, compute exact confidence intervals for the odds ratio.


icd9 and icd9p assist when you are working with ICD-9-CM diagnostic and procedure codes. These commands allow the cleaning up, verification, labeling, and selection of ICD-9 values.

Pharmacokinetics

  There are four new estimation commands and two new utilities intended for the analysis of pharmacokinetic data.


pkexamine calculates pharmacokinetic measures from time-and-concentration subject-level data. pkexamine computes and displays the maximum measured concentration, the time at the maximum measured concentration, the time of the last measurement, the elimination rate, the half-life, and the area under the concentration-time curve (AUC).


pksumm obtains the first four moments from the empirical distribution of each pharmacokinetic measurement and tests the null hypothesis that the measurement is normally distributed.


pkcross analyzes data from a crossover design experiment. When analyzing pharmaceutical trial data, if the treatment, carryover, and sequence variables are known, the omnibus test for separability of the treatment and carryover effects is calculated.


pkequiv performs bioequivalence testing for two treatments. By default, pkequiv calculates a standard confidence interval symmetric about the difference between the two treatment means. Optionally, pkequiv calculates confidence intervals symmetric about zero and intervals based on Fieller's theorem. Additionally, pkequiv can perform interval hypothesis tests for bioequivalence.


pkshape and pkcollapse help in reshaping the data into the form that the above commands need.

Other statistical commands

  jknife performs jackknife estimation, which is (1) an alternative, first-order unbiased estimator for a statistic; (2) a data-dependent way to calculate the standard error of the statistic and to obtain significance levels and confidence intervals; and (3) a way of producing measures reflecting the observation's influence on the overall statistic.


ml can now perform estimation with linear constraints. All that is required is that you specify the constraint() option on the ml maximize command.


statsby creates a dataset of the results of a command executed by varlist:. The results can be any of the saved results of the specified command and, if it is an estimation command, the coefficients and the standard errors. Typing `statsby "regress mpg weight" _b _se e(r2), by(foreign)', for instance, would create a two-observation dataset in which the first recorded the coefficients, standard error, and R2 for foreign=0, and the second recorded them for foreign=1.


lfit, lroc, lsens, and lstat now work after probit just as they do after logit or logistic.


drawnorm draws random samples from a multivariate normal distribution with specified means and covariance matrix.


corr2data creates fictional datasets with the specified means and covariance matrix (correlation structure). Thus, you can take published results and duplicate and modify them if the estimator is solely a function of the first two moments of the data, such as regress, ivreg, anova, or factor.


median performs a nonparametric test that K samples were drawn from populations with the same median.


tabstat displays tables of summary statistics, possibly broken down (conditioned) on another variable.


The command avplot now works after estimation using the robust or cluster() options.

Distribution functions

  Stata's density and distribution functions have been renamed. First, all the old names continue to work, even when not documented in the manual, at least under version control. The new standard, however, is, if X is the name of a distribution, then

Xden() is its density
X() is its cumulative distribution
invX() is its inverse cumulative
Xtail() is its reverse cumulative
invXtail() is its inverse reverse cumulative

Not all functions necessarily exist and, if they do not, that is not solely due to laziness on our part. In particular, concerning the choice between X() and Xtail(), the functions exist that we have accurately implemented. In theory, you only need one because Xtail() = 1-X(), but in practice, the one-minus subtraction wipes out lots of accuracy. If one really wants an accurate right-tail or left-tail probability, one needs a separately written Xtail() or X() routine, written from the ground up.

Anyway, forget everything you ever knew about Stata's distribution functions. Here is the new set:

normden() same as old normd()
norm() same as old normprob()
invnorm() same as old invnorm()
chi2() related to old chiprob(); see below
invchi2() related to old invchi(); see below
chi2tail() related to old chiprob()
invchi2tail() related to old invchi()
F() related to old fprob()
invF() related to old invfprob()
Ftail() same as old fprob()
invFtail() equal to old invfprob()
ttail() related to old tprob(); see below
invttail() related to old invt(); see below
nchi2() equal to old nchi()
invnchi2() equal to old invnchi()
npnchi2() equal to old npnchi()


We want to emphasize that if a function exists, it is calculated accurately. To wit, F() accurately calculates left tails, and Ftail() accurately calculates right tails; Ftail() is far more accurate than 1 - F().

There is no normtail() function. The accurate way to calculate left-tail probabilities (z<0) is norm(z). The accurate way to calculate right-tail probabilities (z>0) is norm(-z).


All the old functions still exist, but in two cases, they work only under version control: The old invt(), under the new naming logic, ought to be the inverse of the cumulative, but is not, so invt() goes into forced retirement for a release or two. It works if version is set to 6 or before; otherwise, you get the error ``unknown function invt()''. Similarly, the old invchi() goes into forced retirement because it is too close to the new name invchi2().

Order Stata 7 Upgrade to Stata 7 Call to order or upgrade


Contact StataCorp
Home  | Site index  | Products  | Support  | News  ]

Email [email protected] if you have questions about Stata.

Email [email protected] with questions or problems about this service.

© Copyright 2001 Stata Corporation.