Home  /  Products  /  Stata 8  /  What's new  /  Statistics

This page contains only historical information and is not about the current release of Stata. Please see our features page for information on the current version of Stata.

Statistical features in Stata 8

New statistics available in Stata 8 are categorized under

Time-series analysis

* Many economic time series are cointegrated and require specialized statistical methods to analyze them. Economic variables, such as consumption, investment, and income, tend to grow over time, while the differences between any two of those variables never deviate too far from a constant equilibrium value. VECMs are used to model such relationships.

Stata's VECM suite includes commands for testing for cointegration and determining the number of cointegrating relationships, choosing the lag order, and fitting the model. Additional commands facilitate post-estimation diagnostic analyses, including testing for stability, autocorrelated residuals, and normality.
 
* The new  vec command fits cointegrated vector error-correction models, also known as VECMs.
 
* The new  vecrank command produces statistics used to determine the number of cointegrating equations in a VECM.
 
* The new fcast command replaces the old command  varfcast and produces dynamic forecasts of the dependent variables after fitting a VAR, SVAR, or VECM.
 
* The new irf command replaces the old command  varirf and does everything the old command did and more. irf estimates the impulse–response functions, cumulative impulse–response functions, orthogonalized impulse–response functions, structural impulse–response functions and forecast-error variance decompositions (FEVDs) after fitting a VAR, SVAR, or VECM. Results can be graphed and presented in tables.

The old  varirf command continues to work but is not documented. If you have old .irf files, they will work with the old  varirf command and the new irf command.
 
* The  varsoc command can be used to obtain lag-order selection statistics for VECMs, as well as VARs.
 
* The new  veclmar command computes Lagrange-multiplier test statistics for residual autocorrelation after fitting a VECM.
 
* The new  vecnorm command computes a series of test statistics against the null hypothesis that the disturbances are normally distributed after fitting a VECM. For each equation, and for all equations jointly, three statistics are computed: a skewness statistic, a kurtosis statistic, and the Jarque–Bera statistic.
 
* The new  vecstable command checks the eigenvalue stability condition after fitting a VECM.
 
* The new  vecstable command and the command  varstable now have a graph option that produces publication-quality graphs to facilitate interpreting and presenting the stability results.
 
* The new haver command makes it easy to load and to analyze the economic and financial databases available from Haver analytics.
  • Stata now can fit vector autoregression (VAR) and structural vector autoregression (SVAR) models.

    A suite of new commands allows you to estimate, tabulate, and graph impulse–response functions, cumulative impulse–response functions, orthogonalized impulse–response functions, structural impulse–response functions, and their confidence intervals, along with forecast-error variance decompositions and structural forecast-error variance decompositions. This suite also allows graphical comparisons of IRFS and variance decompositions across models and orderings.

    A full suite of diagnostic and testing tools is also provided, including Granger causality tests, Lagrange-multiplier (LM) test for residual autocorrelation, tests for normality of the disturbances, lag-order selection statistics, eigenvalue stability checks, and Wald tests that the endogenous variables of a given lag are zero, both for each equation separately and for all equations jointly.

  • The new tssmooth command smooths and predicts univariate time series using weighted or unweighted moving-average, single-exponential smoothing, double-exponential smoothing, Holt–Winters nonseasonal smoothing, Holt–Winters seasonal smoothing, or nonlinear smoothing.

  • The new archlm command computes a Lagrange-multiplier test for autoregressive conditional heteroskedasticity (ARCH) effects in the residuals after regress.

  • The new bgodfrey command computes the Breusch–Godfrey Lagrange-multiplier (LM) test for serial correlation in the disturbances after regress.

  • The new durbina command computes the Durbin (1970) alternative statistic to test for serial correlation in the disturbances after regress when some of the regressors are not strictly exogenous.
  • The new dfgls command performs the modified Dickey–Fuller t test for a unit root (proposed by Elliott, Rothenberg, and Stock (1996)) using models with 1 to maxlags lags of the first-differenced variable in an augmented Dickey–Fuller regression.

  • The existing arima command may now be used with the by prefix command, and it now allows prediction in loops over panels.
* new in Stata 8 as of July 2004

Cross-sectional time-series analysis

  • The new xthtaylor command fits panel-data random-effects models using the Hausman–Taylor and the Amemiya–MaCurdy instrumental-variables estimators.

  • The new xtfrontier command fits stochastic production or cost frontier models for panel data allowing two different parameterizations for the inefficiency term: a time-invariant model and the Battese–Coelli (1992) parameterization of time effects.

  • The existing xtivreg command will now optionally report first-stage results of Baltagi's EC2SLS random-effects estimator.

  • The existing xttobit and xtintreg commands can now predict after estimation the probability that the dependent variable is uncensored, the corresponding expected value E(y | #_a<y<#_b), and the expected value of the dependent variable truncated at the censoring point(s).

Survival analysis

  • Using stcox, you can now fit Cox semiparametric proportional-hazards models that allow for gamma-distributed frailty. In this model, frailty is assumed to be shared across groups of observations. Previously, if you wanted to analyze multivariate survival data using the Cox model, you would fit a standard model and account for the correlation within groups by adjusting the standard errors for clustering. Now, you can directly model the correlation by assuming a latent gamma-distributed random effect or frailty; observations within group are correlated because they share the same frailty. Estimation is done via penalized likelihood. You can estimate the frailty variance and obtain group-level frailty estimates.

    sts graph and stcurve (after stcox) can now plot estimated hazard functions, which are calculated as weighted kernel smooths of the estimated hazard contributions.

  • streg has new option shared(varname) for fitting parametric shared frailty models, which are analogous to random-effects models for panel data. streg can also fit frailty models in which the frailties are assumed to be randomly distributed at the observation level.

    Post-estimation, predictions conditional on frailty equal to 1, and unconditional predictions (predictions averaged over the frailty distribution) are available.

  • Stata's stepwise and fractional polynomial specification-search methods now work with stcox and streg.

Survey analysis

  • Stata's programmable maximum likelihood estimation routine ml has new options that automatically handle the production of survey estimators, including stratification and estimation on a subpopulation.

  • Survey estimation is now available for the Heckman selection model and the Heckman selection model applied to probit.

  • Survey estimation is now available for negative-binomial regression and generalized negative-binomial regression.

  • Constraints may now be applied to equations using survey estimators, as with Stata's other estimators.

  • Point estimates, standard errors, and confidence intervals are now available for linear combinations of estimated parameters, as with Stata's other estimators.

  • Point estimates, standard errors, and confidence intervals are now available for nonlinear combinations of estimated parameters.

  • Estimators for nonlinear combinations and generalized predictions are available.

Cluster analysis

  • Ward's linkage hierarchical clustering and Ward's method (also known as minimum-variance clustering) are now available.

  • Weighted-average linkage hierarchical clustering, supplementing the previously available average linkage clustering, is now available.

  • Centroid linkage hierarchical clustering is now available.

  • Median linkage hierarchical clustering, also known as Gower's method, is now available.

  • Stopping rules may now be specified. Two popular stopping rules are provided: the Calínski and Harabasz pseudo-F index (Calínski and Harabasz [1974]) and the Duda and Hart Je(2)/Je(1) index with associated pseudo-T-squared (Duda and Hart [1973]). Additional stopping rules can be added.

  • Two new dissimilarity measures have been added: squared Euclidean distance and the Minkowski distance metric with argument a raised to the a power.

Statistics useful across fields

The following new estimation procedures are available, in addition to the new estimators listed in previous sections:

  • MANOVA and MANCOVA, with balanced and unbalanced designs, including designs with missing cells, and with factorial, nested, or mixed designs.

  • Rank-ordered logit model, also known as the exploded logit model, is a generalized McFadden's choice model as fitted by clogit. In the choice model, only the alternative that maximizes utility is observed. rologit fits the corresponding model in which the preference ranking of the alternatives is observed, not just the alternative that is ranked first. rologit supports incomplete rankings and ties (``indifference'').

  • Stochastic frontier models with technical or cost-inefficiency effects.

Also, Stata 8 includes the following new and enhanced commands:

  • New command mfp selects the fractional polynomial model that best predicts the dependent variable from the independent variables.

  • The new nlcom command computes point estimates, standard errors, t and Z statistics, p-values, and confidence intervals for nonlinear combinations of coefficients after any estimation command. Results are displayed in the table format commonly used for displaying estimation results. The standard errors are based on the ``delta method''.

  • The new predictnl command produces nonlinear predictions after any Stata estimation command and can optionally calculate the variance, standard errors, Wald-test statistics, significance levels, and pointwise confidence intervals for these predictions. Unlike with testnl and nlcom, the quantities generated by predictnl can vary over the observations in the data. The standard errors and other inference-related quantities are based on the ``delta method''.

  • The new bootstrap command replaces the old bstrap and bs commands. bootstrap has an improved syntax and allows for stratified sampling.

  • Existing command bsample now accepts the strata() option and has a new weight() option that allows you to save the sample frequency instead of changing the data in memory.

  • The existing bstat command can now construct bias-corrected and accelerated (BCa) confidence intervals. In addition, bstat is now an e-class command, meaning that all the post-estimation commands can be used on bootstrap results.

  • Existing command jknife now accepts the cluster() option.

  • New command permute estimates p-values for permutation tests based on Monte Carlo simulations. These estimates can be one sided or two sided.

  • Existing command sample has new option count that allows samples of the specified number of observations (rather than a percentage) to be drawn.

  • New command simulate replaces simul and provides improved syntax for specifying simulations.

  • Existing command statsby has a new syntax, new options, and now allows time-series operators.

  • The new estimates command provides a new, consistent way to store and refer to estimation results. Post-estimation commands that make comparisons across models, such as lrtest and hausman, previously had their own idiosyncratic ways to store and refer to estimation results. These commands now support a unified way of retrieving estimation results utilizing the new estimates suite.

  • New command suest is a post-estimation command that combines multiple estimation results (parameter vectors and their variance–covariance matrices) into simultaneous results with a single stacked parameter vector and a robust (sandwich) variance–covariance matrix. The estimation results to be combined may be based on different, overlapping, or even the same data. After creating the simultaneous estimation results, you can use test or testnl to obtain Hausman-type tests for cross-model hypotheses. suest supports survey data.

  • New command imtest performs the information matrix test for a regression model. In addition, it provides the Cameron–Trevedi decomposition of the IM-test in tests for heteroskedasticity, skewness, and kurtosis, as well as White's original heteroskedasticity test.

  • New command szroeter performs Szroeter's test for heteroskedasticity in a regression model.

  • Existing command hettest now provides option rhs to test for heteroskedasticity in the independent variables. It now also supports multiple comparison testing.

  • Existing command tabulate has output changes, new features, and expanded limits.

    Three new statistics are available for twoway tabulations: the expected number in each cell, the contribution to Pearson's chi2, and the contribution to the likelihood-ratio chi2.

    tabulate now respects set linesize, so you can produce wide tables.

    tabulate for oneway tabulations has new option sort, which puts the table in descending order of frequency.

  • Existing command tabstat can now produce tables containing the variance and/or the standard error of the mean.

  • Existing command roctab has new option specificity to graph sensitivity versus specificity, instead of the default sensitivity versus (1-specificity).

  • Existing command ologit now has option or to display results as odds ratios (display exponentiated coefficients).

  • Existing command adjust can now display predicted probabilities when used after svylogit, svyprobit, xtlogit, and xtprobit.

  • rvpplot has been extended to work after anova. In addition, cprplot and acprplot have new options lowess and mspline that allow putting a lowess curve or median spline through the data.
* Existing command clogit has new options robust and cluster. In addition, clogit has been converted from a built-in command to one that now uses ml. As a result, clogit now supports options that are available to ml-programmed estimators, such as constraint() for linear constraints.
* new in Stata 8 as of July 2004