Home  /  Products  /  Stata 10  /  General statistics

This page announced updates in Stata 10. See a complete overview of all of Stata's features.


General statistics

Here are the details:

  1. You can now save estimation results to disk. You type
            estimates save filename
    to save results and
            estimates use filename
    to reload them. In fact, the entire estimates command has been reworked. The new command estimates notes allows you to add notes to estimation results just as you add them to datasets. The new command estimates esample allows you to restore e(sample) after reloading estimates, should that be necessary (usually it is not). The maximum number of estimation results that can be held in memory (as opposed to saved on disk) is increased to 300 from 20. See [R] estimates.
  2. Stata now has exact logistic and exact Poisson regression. Rather than having their inference based on asymptotic normality, exact estimators enumerate the conditional distribution of the sufficient statistics and then base inference upon that distribution. In small samples, exact methods have better coverage than asymptotic methods, and exact methods are the only way to obtain point estimates, tests, and confidence intervals from covariates that perfectly predict the observed outcome.

    Postestimation command estat se reports odds ratios and their asymptotic standard errors. estat predict, available only after exlogistic computes predicted probabilities, asymptotic standard errors, and exact confidence intervals for single cases.

    See [R] exlogistic and [R] expoisson.
  3. New estimation command asclogit performs alternative-specific conditional logistic regression, which includes McFadden’s choice model. Postestimation command estat alternatives reports alternative-specific summary statistics. estat mfx reports marginal effects of regressors on probabilities of each alternative. See [R] asclogit and [R] asclogit postestimation.
  4. New estimation command asroprobit performs alternative-specific rank-ordered probit regression. asroprobit is related to rank-ordered logistic regression (rologit) but allows modeling alternative-specific effects and modeling the covariance structure of the alternatives. Postestimation command estat alternatives provides summary statistics about the alternatives in the estimation sample. estat covariance displays the variance–covariance matrix of the alternatives. estat correlation displays the correlation matrix of the alternatives. estat mfx computes the marginal effects of regressors on the probability of the alternatives. See [R] asroprobit and [R] asroprobit postestimation.
  5. New estimation command ivregress performs single-equation instrumental-variables regression by two-stage least squares, limited-information maximum likelihood, or generalized method of moments. Robust and HAC covariance matrices may be requested. Postestimation command estat firststage provides various descriptive statistics and tests of instrument relevance. estat overid tests overidentifying restrictions. ivregress replaces the previous ivreg command. See [R] ivregress and [R] ivregress postestimation.
  6. New estimation command nlsur fits a system of nonlinear equations by feasible generalized least squares, allowing for covariances among the equations; see [R] nlsur.
  7. Existing estimation command nlogit was rewritten and has new, better syntax and runs faster when there are more than two levels. Old syntax is available under version control. nlogit now optionally fits the random utilities maximization (RUM) model as well as the nonnormalized model that was available previously. The new nlogit now allows unbalanced groups and allows groups to have different sets of alternatives. nlogit now excludes entire choice sets (cases) if any alternative (observation) has a missing value; use new option altwise to exclude just the alternatives (observations) with missing values. Finally, vce(robust) is allowed regardless of the number of nesting levels. See [R] nlogit.
  8. Existing estimation command asmprobit has the following enhancements:

    1. The new default parameterization estimates the covariance of the alternatives differenced from the base alternative, making the estimates invariant to the choice of base. New option structural specifies that the previously structural (nondifferenced) covariance parameterization be used.
    2. asmprobit now permits estimation of the constant-only model.
    3. asmprobit now excludes entire choice sets (cases) if any alternative (observation) has a missing value; use new option altwise to exclude just the alternatives (observations) with missing values.
    4. New postestimation command estat mfx computes marginal effects after asmprobit.
    See [R] asmprobit and [R] asmprobit postestimation.
  9. Existing estimation command clogit now accepts pweights and may be used with the svy: prefix.

    Also, clogit used to be willing to produce cluster-robust VCEs when the groups were not nested within the clusters. Sometimes, this VCE was consistent, and other times it was not. You must now specify the new nonest option to obtain a cluster-robust VCE when the groups are not nested within panels.

    predict after clogit now accepts options that calculate the Δβ influence statistic, the Δchi2 lack-of-fit statistic, the Hosmer and Lemeshow leverage, the Pearson residuals, and the standardized Pearson residuals.

    See [R] clogit and [R] clogit postestimation.
  10. Existing estimation command cloglog now accepts pweights, may now be used with the svy: prefix, and has new option eform that requests that exponentiated coefficients be reported; see [R] cloglog.
  11. Existing estimation command cnreg now accepts pweights, may be used with the svy: prefix, and is now noticeably faster (up to five times faster) when used within loops, such as by statsby. See [R] cnreg.
  12. Existing estimation commands cnsreg and tobit now accept pweights, may be used with the svy: prefix, and are now noticeably faster (up to five times faster) when used within loops, such as by statsby. Also, cnsreg now has new advanced option mse1 that sets the mean squared error to 1. See [R] cnsreg and [R] tobit.
  13. Existing estimation command regress is now noticeably faster (up to five times faster) when used with loops, such as by statsby. Also,

    1. Postestimation command estat hettest has new option iid that specifies that an alternative version of the score test be performed that does not require the normality assumption. New option fstat specifies that an alternative F test be performed that also does not require the normality assumption.
    2. Existing postestimation command estat vif has new option uncentered that specifies that uncentered variance inflation factors be computed.
    See [R] regress postestimation.
  14. Existing estimation commands logit, mlogit, ologit, oprobit, and probit are now noticeably faster (up to five times faster) when used within loops, such as by statsby.
  15. For existing estimation command probit, predict now allows the deviance option; see [R] probit postestimation.
  16. Existing estimation command nl has the following enhancements:

    1. Option vce(vcetype) is now allowed, with supported vcetypes that include types derived from asymptotic theory, that are robust to some kinds of misspecification, that allow for intragroup correlation, and that use bootstrap or jackknife methods. Also, three heteroskedastic- and autocorrelation-consistent variance estimators are available.
    2. nl no longer reports an overall model F test because the test that all parameters other than the constant are jointly zero may not be appropriate in arbitrary nonlinear models.
    3. The coefficient table now reports each parameter as its own equation, analogous to how ml reports single-parameter equations.
    4. predict after nl has new options that allow you to obtain the probability that the dependent variable lies within a given interval, the expected value of the dependent variable conditional on its being censored, and the expected value of the dependent variable conditional on its being truncated. These predictions assume that the error term is normally distributed.
    5. mfx can be used after nl to obtain marginal effects.
    6. lrtest can be used after nl to perform likelihood-ratio tests.
    See [R] nl and [R] nl postestimation.
  17. Existing estimation command mprobit now allows pweights, may now be used with the svy: prefix, and has new option probitparam that specifies that the probit variance parameterization, which fixes the variance of the differenced latent errors between the scale and the base alternatives to one, be used. See [R] mprobit.
  18. Existing estimation command rologit now allows vce(bootstrap) and vce(jackknife). See [R] rologit.
  19. Existing estimation command truncreg now allows pweights and now works with the svy: prefix. See [SVY] svy estimation.
  20. After existing estimation command ivprobit, postestimation commands estat classification, lroc, and lsens are now available. Also, in ivprobit, the order of the ancillary parameters in the output has been changed to reflect the order in e(b). See [R] ivprobit and [R] ivprobit postestimation.
  21. All estimation commands that allowed options robust and cluster() now allow option vce(vcetype). vce() specifies how the variance–covariance matrix of the estimators (and hence standard errors) are to be calculated. This syntax was introduced in Stata 9, with options such as vce(bootstrap), vce(jackknife), and vce(oim).

    In Stata 10, option vce() is extended to encompass the robust (and optionally clustered) variance calculation. Where you previously typed
            . estimation-command ..., robust
    you are now to type
            . estimation-command ..., vce(robust)
    and where you previously typed
            . estimation-command ..., robust cluster(clustervar) 
    with or without the robust, you are now to type
            . estimation-command ..., vce(cluster clustervar) 
    You can still type the old syntax, but it is undocumented. The new syntax emphasizes that the robust and cluster calculation affects standard errors, not coefficients. See [R] vce_option.

    In accordance with this change, estimation commands now have a term for their default variance calculation. Thus, you will see things like vce(ols), and vce(gnr). Here is what they all mean:

    1. vce(ols). The variance estimator for ordinary least squares; an s2(XX)−1-type calculation.
    2. vce(oim). The observed information matrix based on the likelihood function; a (−H)−1-type calculation, where H is the Hessian matrix.
    3. vce(conventional). A generic term to identify the conventional variance estimator associated with the model. For instance, in the Heckman two-step estimator, vce(conventional) means the Heckman-derived variance matrix from an augmented regression. In two different contexts, vce(conventional) does not necessarily mean the same calculation.
    4. vce(analytic). The variance estimator derived from first principles of statistics for means, proportions, and totals.
    5. vce(gnr). The variance matrix based on an auxiliary regression, which is analogous to s2(XX)−1 generalized to nonlinear regression. gnr stands for Gauss–Newton regression.
    6. vce(linearized). The variance matrix calculated by a first-order Taylor approximation of the statistic, otherwise known as the Taylor linearized variance estimator, the sandwich estimator, and the White estimator. This is identical to vce(robust) in other contexts.
    The above are used for defaults. vce() may also be

    1. vce(robust). The variance matrix calculated by the sandwich estimator of variance, VDV-type calculation, where V is the conventional variance matrix and D is the outer product of the gradients, Σi gig′i.
    2. vce(clustervarname). The cluster-based version of vce(robust) where sums are performed within the groups formed by varname, which is equivalent to assuming that the independence is between groups only, not between observations.
    3. vce(hc2) and vce(hc3). Calculated similarly as vce(robust) except that different scores are used in place of the gradient vectors gi.
    4. vce(opg). The variance matrix calculated by the outer product of the gradients; a (Σi gig′i)−1 calculation.
    5. vce(jackknife). The variance matrix calculated by the jackknife, including delete one, delete n, and the cluster-based jackknife.
    6. vce(bootstrap). The variance matrix calculated by bootstrap resampling.
    You do not need to memorize the above; the documentation for the individual commands, and their corresponding dialog boxes, make clear what is the default and what is available.
  22. Estimation commands specified with option vce(bootstrap) or vce(jackknife) now report a note when a variable is dropped because of collinearity.
  23. The new option collinear, which has been added to many estimation commands, specifies that the estimation command not remove collinear variables. Typically, you do not want to specify this option. It is for use when you specify constraints on the coefficients such that, even though the variables are collinear, the model is fully identified. See [R] estimation options.
  24. Estimation commands having a model Wald test composed of more than just the first equation now save the number of equations in the model Wald test in e(k_eq_model).
  25. All estimation commands now save macro e(cmdline) containing the command line as originally typed.
  26. Concerning existing estimation command ml,

    1. ml now saves the number of equations used to compute the model Wald test in e(k_eq_model), even when option lf0() is specified.
    2. ml score has new option missing that specifies that observations containing variables with missing values not be eliminated from the estimation sample.
    3. ml display has new option showeqns that requests that equation names be displayed in the coefficient table.
    See [R] ml.
  27. New command lpoly performs a kernel-weighted local polynomial regression and displays a graph of the smoothed values with optional confidence bands; see [R] lpoly.
    lpoly graph
  28. New prefix command nestreg: reports comparison tests of nested models; see [R] nestreg.
  29. Existing commands fracpoly, fracgen, and mfp have new features:

    1. fracpoly and mfp now support cnreg, mlogit, nbreg, ologit, and oprobit.
    2. fracpoly and mfp have new option all that specifies that out-of-sample observations be included in the generated variables.
    3. fracpoly, compare now reports a closed-test comparison between fractional polynomial models by using deviance differences rather than reporting the gain; see [R] fracpoly.
    4. fracgen has new option restrict() that computes adjustments and scaling on a specified subsample.
    See [R] fracpoly and [R] mfp.
  30. For existing postestimation command hausman, options sigmaless and sigmamore may now be used after xtreg. These options improve results when comparing fixed- and random-effects regressions based on small to moderate samples because they ensure that the differenced covariance matrix will be positive definite. See [R] hausman.
  31. Existing postestimation command testnl now allows expressions that are bound in parentheses or brackets to have commas. For example, testnl _b[x] = M[1,3] is now allowed. See [R] testnl.
  32. Existing postestimation command nlcom has a new option noheader that suppresses the output header; see [R] nlcom.
  33. Existing command statsby now works with more commands, including postestimation commands. statsby also has new option forcedrop for use with commands that do not allow if or in. forcedrop specifies that observations outside the by() group be temporarily dropped before the command is called. See [D] statsby.
  34. Existing command mkspline will now create restricted cubic splines as well as linear splines. New option displayknots will display the location of the knots. See [R] mkspline.
  35. In existing command kdensity, kernel(kernelname) is now the preferred way to specify the kernel, but the previous method of simply specifying kernelname still works. See [R] kdensity.
  36. Existing command ktau’s computations are now faster; see [R] spearman.
  37. In existing command ladder, the names of the transformations in the output have been renamed to match those used by gladder and qladder. Also, the returned results r(raw) and r(P_raw) have been renamed to r(ident) and r(P_ident), respectively. See [R] ladder.
  38. Existing command ranksum now allows the groupvar in option by(groupvar) to be a string; see [R] ranksum.
  39. Existing command tabulate, exact now allows exact computations on larger tables. Also, new option nolog suppresses the enumeration log. See [R] tabulate twoway.
  40. Existing command tetrachoric’s default algorithm for computing tetrachoric correlations has been changed from the Edwards and Edwards estimator to a maximum likelihood estimator. Also, standard errors and two-sided significance tests are produced. The Edwards and Edwards estimator is still available by specifying the new edwards option. A new zeroadjust option requests that frequencies be adjusted when one cell has a zero count. See [R] tetrachoric.

Back to highlights