Stata 15 help for whatsnew7to8

What's new in release 8.0 (compared with release 7)

This file lists the changes corresponding to the creation of Stata release 8.0:

+---------------------------------------------------------------+ | help file contents years | |---------------------------------------------------------------| | whatsnew Stata 15.0 and 15.1 2017 to present | | whatsnew14to15 Stata 15.0 new release 2017 | | whatsnew14 Stata 14.0, 14.1, and 14.2 2015 to 2017 | | whatsnew13to14 Stata 14.0 new release 2015 | | whatsnew13 Stata 13.0 and 13.1 2013 to 2015 | | whatsnew12to13 Stata 13.0 new release 2013 | | whatsnew12 Stata 12.0 and 12.1 2011 to 2013 | | whatsnew11to12 Stata 12.0 new release 2011 | | whatsnew11 Stata 11.0, 11.1, and 11.2 2009 to 2011 | | whatsnew10to11 Stata 11.0 new release 2009 | | whatsnew10 Stata 10.0 and 10.1 2007 to 2009 | | whatsnew9to10 Stata 10.0 new release 2007 | | whatsnew9 Stata 9.0, 9.1, and 9.2 2005 to 2007 | | whatsnew8to9 Stata 9.0 new release 2005 | | whatsnew8 Stata 8.0, 8.1, and 8.2 2003 to 2005 | | this file Stata 8.0 new release 2003 | | whatsnew7 Stata 7.0 2001 to 2002 | | whatsnew6to7 Stata 7.0 new release 2000 | | whatsnew6 Stata 6.0 1999 to 2000 | +---------------------------------------------------------------+

Most recent changes are listed first.

--- more recent updates -------------------------------------------------------

See whatsnew8.

--- Stata 8.0 release 02jan2003 -----------------------------------------------

As always, Stata 8.0 is 100% compatible with the previous release of Stata, but as always we remind programmers that it is vitally important that you put version 7.0 at the top of your old do-files and ado-files if they are to work; see help version. You were supposed to do that when you wrote them but, if you did not, go back and do it now. We have made a lot of changes (improvements) to Stata.

In addition, Stata's dataset format has changed because of the new longer data storage types and the fact that Stata now has multiple representations for missing values. You will not care because Stata automatically reads old-format datasets, but if you need to send a dataset to someone still using Stata 7, remember to use the saveold command; see help saveold.

The features added to Stata 8.0 are listed under the following headings.

What's big Graphics GUI What's useful What's convenient What was needed What's faster What's new in time-series analysis What's new in cross-sectional time-series analysis What's new in survival analysis What's new in survey analysis What's new in cluster analysis What's new in statistics useful in all fields What's new in data management What's new in expressions and functions What's new in display formats What's new in programming What's new in the user interface What's more

What's big

The big news is the new GUI and the new Graphics. There is no putting them in an order.

Graphics

You can create graphs that look like this

(click to run)

or this

(click to run)

See A quick tour in help graph_intro. Everything you need to know is online.

So what's new in Stata graphics? Everything. There is not one little bit that is not new, even if it seems familiar.

Before you panic, let us tell you that all the old graphics are still in Stata. If you type

. graph7 ...

or

. gr7 ...

you will be back to using the old graph command; see help graph7. Moreover, the old graph command is still invoked under version control; see help version. If you set your version to 7.0 or earlier, graph does not mean what is defined in help graph; it means what it used to mean, which means that old do-files and ado-files continue to work.

One new feature requires some adjustment. What used to be called symbols are now called markers, and marker symbols are the shapes of the markers. Thus, you no longer specify the symbol() or s() option, you specify the msymbol() or ms() option. In addition, the old s(.) for specifying the dot symbol is now ms(p) (p stands for point). ms(.) means to use the default.

All existing statistical commands that produce graphs have been updated to take advance of the new graphics.

GUI

GUI stands for Graphical User Interface, and to try it, you do not need to read a thing. Pull down Data, Graphics, or Statistics, find what you are looking for, and click.

Fill in the dialog box and click to submit. Do not ignore tabs at the top -- there are very useful things hidden under them.

If you know the command you want, you can skip the menus and type db followed by the command name. For instance you can jump directly to the stcox dialog box by typing db stcox (or click here). See help db.

What's useful

Stata 8 has so many features that finding what you are looking for can be a challenge. We have addressed that:

1. Pull down Help and select Contents. You will be presented with the categories Basics, Data management, Statistics, Graphics, and Programming. Click on one of them -- say, Statistics -- and you will be presented with another set of categories: Summary statistics and tests, Tables, Estimation, Multivariate analysis, Resampling and simulation, Statistical hand calculations, and Special topics. Click on one of those and, well, you get the idea. With the new help contents, it never takes long to find what you need.

2. Help files now have hyperlinks in the header for launching the dialog associated with the command. So, there are three ways to launch a dialog box: (1) use the menus (pull down Data, Graphics, or Statistics); (2) use the new db command (see help db); or (3) pick the command from the online help.

3. When you do need to search, findit is the key. findit searches everywhere: Stata itself, the Stata website, the FAQs, the Stata Journal, and even user-written programs available on the web. An earlier version of findit was made available as an update to Stata 7, but the new version is better. You can also access findit by pulling down Help and selecting Search. If you do that, be sure to click Search all in the dialog box. See help search.

4. The new ssc command lists and installs user-written packages from the Statistical Software Components (SSC) archive, also known as the Boston College Archive, located at http://www.repec.org. See help ssc.

5. The new net sj command makes loading files from the new Stata Journal easier; see help net.

What's convenient

The existing set command has a new permanently option that allows you to make the setting permanent. This does away with the necessity of having a profile.do file for most users.

What was needed

Stata now has multiple missing values! In addition to the previously existing ., there is now .a, .b, ..., .z, and you can attach value labels to the new missing codes!

One thing to watch out for: Do not type

. stata_command ... if x != .

Instead, type

. stata_command ... if x < .

You need remember this only if you use the new missing values, but better to have good habits. The way things now work,

all numbers < . < .a < .b < ... < .z

So, if you wanted to list all observations for which x is missing, you would type

. list if x >= .

See help missing.

What's faster

Stata 8 executes programming commands in half the time of Stata 7, on average. This results in commands implemented as ado-files running about 17 to 43% faster.

1. This speed-up is due to a new, faster memory manager that reduces the time needed to find, access, and store results. Thus, the improvement does not change much the time to run built-in, heavily computational commands. regress, for instance, runs only 1.43% faster. Nevertheless, the effect can be marked on other commands. poisson runs up to 31% faster, and heckman runs up to 43% faster. The larger the dataset, the less will be the improvement: heckman runs 17% faster on 4,000 observations.

2. That statistical commands run faster is a happy side effect. The big advantage of the speed-up is that it allows some problems to be approached using ado-files that previously would have required internal code, such as Stata's new graphics, which is an ado-file implementation! Some programming commands run up to 400% faster. Implementing features as ado-files is part of the effort to keep Stata open and extendable by users.

What's new in time-series analysis

1. Stata now can fit vector autoregression (VAR) and structural vector autoregression (SVAR) models. New commands var, varbasic, and svar perform the estimation; see help varintro.

a. A suite of varirf commands estimate, tabulate, and graph impulse-response functions, cumulative impulse-response functions, orthogonalized impulse-response functions, structural impulse-response functions, and their confidence intervals, along with forecast-error variance decompositions and structural forecast-error variance decompositions; see help varirf. This suite allows graphical comparisons of IRFs and variance decompositions across models and orderings.

b. varfcast produces dynamic forecasts from a previously fitted var or svar model; see help varfcast.

c. There is also a full suite of diagnostic and testing tools including

i. vargranger, that performs Granger causality tests; see help vargranger.

ii. varlmar, that performs a Lagrangian multiplier (LM) test for residual autocorrelation; see help varlmar.

iii. varnorm, that performs a series of tests for normality of the disturbances; see help varnorm.

iv. varsoc, that reports a series of lag order selection statistics; see help varsoc.

v. varstable, that checks the eigenvalue stability condition; see help varstable.

vi. varwle, that performs a Wald test that all the endogenous variables of a given lag are zero, both for each equation separately and for all equations jointly; see help varwle.

2. The new tssmooth command smooths and predicts univariate time series using weighted or unweighted moving average, single exponential smoothing, double exponential smoothing, Holt-Winters nonseasonal smoothing, Holt-Winters seasonal smoothing, or nonlinear smoothing. See help tssmooth.

3. The new tsappend command appends observations to a time-series dataset, automatically filling in the time variable and the panel variable, if set, by using the information contained in tsset. See help tsappend.

4. The new archlm command computes a Lagrange multiplier test for autoregressive conditional heteroskedasticity (ARCH) effects in the residuals after regress; see help archlm.

5. The new bgodfrey command computes the Breusch-Godfrey Lagrange multiplier (LM) test for serial correlation in the disturbances after regress; see help bgodfrey.

6. The new durbina command computes the Durbin (1970) alternative statistic to test for serial correlation in the disturbances after regress when some of the regressors are not strictly exogenous; see help durbina.

7. The new dfgls command performs the modified Dickey-Fuller t test for a unit root (proposed by Elliott, Rothenberg, and Stock (1996)) using models with 1 to maxlags lags of the first differenced variable in an augmented Dickey-Fuller regression; see help dfgls.

8. The existing arima command may now be used with the by prefix command, and it now allows prediction in loops over panels; see help arima.

9. The existing newey command now allows (and requires) that you tsset your data; see help newey.

What's new in cross-sectional time-series analysis

1. The new xthtaylor command fits panel-data random-effects models using the Hausman-Taylor and the Amemiya-MaCurdy instrumental-variables estimators; see help xthtaylor.

2. The new xtfrontier command fits stochastic production or cost frontier models for panel data allowing two different parameterizations for the inefficiency term: a time-invariant model and the Battese-Coelli (1992) parameterization of time effects; see help xtfrontier.

3. The existing xtabond command now allows endogenous regressors; see help xtabond.

4. The existing xtivreg command will now optionally report first stage results of Baltagi's EC2SLS random-effects estimator; see help xtivreg.

5. The existing xttobit and xtintreg commands have new predict options:

a. pr0(#_a,#_b) produces the probability of the dependent variable being uncensored P(#_a< y < #_b).

b. e0(#_a,#_b) produces the corresponding expected value E(y | #_a < y < #_b).

c. ystar(#_a,#_b) produces the expected value of the dependent variable truncated at the censoring point(s), E(y^*), where y^* = max(#_a, min(y,#_b)).

See help xttobit and xtintreg.

6. Existing commands xtgee and xtlogit have a new nodisplay option that suppresses the header and table of coefficients; {cmd:xtregar, fe} now allows aweights and fweights; and xtpcse now has no restrictions on how aweights are applied. See help xtgee, xtlogit, and xtpcse.

7. Two commands have been renamed: xtpois is now called xtpoisson and xtclog is now xtcloglog. The old names continue to work. See help xtpoisson and xtcloglog.

What's new in survival analysis

1. Existing command stcox has an important new feature and some minor improvements:

a. stcox will now fit models with gamma-distributed frailty. In this model, frailty is assumed to be shared across groups of observations. Previously, if one wanted to analyze multivariate survival data using the Cox model, one would fit a standard model and account for the correlation within groups by adjusting the standard errors for clustering. Now, one may directly model the correlation by assuming a latent gamma-distributed random effect or frailty; observations within group are correlated because they share the same frailty. Estimation is via penalized likelihood. An estimate of the frailty variance is available and group-level frailty estimates can be retrieved.

b. fracpoly, sw, and linktest now work after stcox.

See help stcox.

2. Existing command streg has an important new feature and some minor improvements:

a. streg has new option shared(varname) for fitting parametric shared frailty models, analogous to random effects models for panel data. streg could, and still can, fit frailty models where the frailties are assumed to be randomly distributed at the observation level.

b. fracpoly, sw, and linktest now work after streg.

c. streg has four other new options: noconstant, offset(), noheader, and nolrtest.

See help streg.

3. predict after streg, frailty() has two new options:

a. alpha1 generates predictions conditional on a frailty equal to 1.

b. unconditional generates predictions that are "averaged" over the frailty distribution.

These new options may also be used with stcurve. See help streg.

4. sts graph and stcurve (after stcox) can now plot estimated hazard functions, which are calculated as weighted kernel smooths of the estimated hazard contributions; see help sts.

5. streg, dist(gamma) is now faster and more accurate. In addition, you can now predict mean time after gamma; see help streg.

6. Old commands ereg, ereghet, llogistic, llogistichet, gamma, gammahet, weibull, weibullhet, lnormal, lnormalhet, gompertz, gompertzhet are deprecated (they continue to work) in favor of streg. Old command cox is now deprecated (it continues to work) in favor of stcox. See help streg and stcox.

What's new in survey analysis

1. Stata's ml user-programmable likelihood-estimation routine has new options that automatically handle the production of survey estimators, including stratification and estimation on a subpopulation; see help ml.

2. Four new survey estimation commands are available:

a. svynbreg for negative-binomial regression; see help svynbreg.

b. svygnbreg for generalized negative-binomial regression; see help svygnbreg.

c. svyheckman for the Heckman selection model; see help svyheckman.

d. svyheckprob for probit regression with selection; see help svyheckprob.

3. Use of the survey commands has been made more consistent.

a. svyset has new syntax. Before it was

svyset thing_to_set [, clear ]

and now it is

svyset [weight] [, strata(varname) psu(varname) fpc(varname) ]

See help svyset for details. In addition, you must now svyset your data prior to using the survey commands; no longer can you set the data via options to the other survey commands.

b. Two survey estimation commands have been renamed: svyreg to svyregress and svypois to svypoisson; see help svyregress and svypois.

c. svyintreg now applies constraints in the same manner as all other estimation commands; see help svyintreg.

d. lincom now works after all svy estimators; see help lincom. (svylc is now deprecated.)

e. testnl now works after all svy estimators; see help testnl.

f. testparm now works after all svy estimators; see help test.

g. The new nlcom and predictnl commands, which form nonlinear combinations of estimators and generalized predictions, work after all svy estimators; see help nlcom and predictnl.

4. Existing command svytab has three new options: cellwidth(), csepwidth(), and stubwidth(); they specify the widths of table elements in the output. See help svytab.

What's new in cluster analysis

1. The new cluster wardslinkage command provides Ward's linkage hierarchical clustering and can produce Ward's method, also known as minimum-variance clustering. See help clward.

2. The new cluster waveragelinkage command provides weighted-average linkage hierarchical clustering to accompany the previously available average linkage clustering. See help clwav.

3. The new cluster centroidlinkage command provides centroid linkage hierarchical clustering. This differs from the previously available cluster averagelinkage in that it combines groups based on the average of the distances between observations of the two groups to be combined. See help clcent.

4. The new cluster medianlinkage command provides median linkage hierarchical clustering, also known as Gower's method. See help clmedian.

5. The new cluster stop command provides stopping rules. Two popular stopping rules are provided, the Calinski & Harabasz pseudo-F index (Calinski and Harabasz (1974)) and the Duda & Hart Je(2)/Je(1) index with associated pseudo T-squared (Duda and Hart (1973)). See help clstop.

Additional stopping rules can be added; see help clprog.

6. Two new dissimilarity measures have been added: L2squared and Lpower(#). L2squared provides squared Euclidean distance. Lpower(#) provides the Minkowski distance metric with argument # raised to the # power. See help cldis.

7. A list of the variables used in the cluster analysis is now saved with the cluster analysis structure, which is useful for programmers; see help clprog.

What's new in statistics useful in all fields

1. The following new estimators are available:

a. manova fits multivariate analysis-of-variance (MANOVA) and multivariate analysis-of-covariance (MANCOVA) models for balanced and unbalanced designs, including designs with missing cells; and for factorial, nested, or mixed designs. See help manova. (manovatest provides multivariate tests involving terms from the most recently fitted manova; see help manovatest.)

b. rologit fits the rank-order logit model, also known as the exploded logit model. This model is a generalized McFadden's choice model as fitted by clogit. In the choice model, only the alternative that maximizes utility is observed. rologit fits the corresponding model in which the preference ranking of the alternatives is observed, not just the alternative that is ranked first. rologit supports incomplete rankings and ties ("indifference"). See help rologit.

c. frontier fits stochastic frontier models with technical or cost inefficiency effects. frontier can fit models in which the inefficiency error component is assumed to be from one of the three distributions: half-normal, exponential, or truncated-normal. In addition, when the inefficiency term is assumed to be either half-normal or exponential, frontier can fit models in which the error components are heteroskedastic, conditional on a set of covariates. frontier can also fit models in which the mean of the inefficiency term is modeled as a linear function of a set of covariates. See help frontier.

These new estimators are in addition to the new estimators listed in previous sections.

2. New command mfp selects the fractional polynomial model that best predicts the dependent variable from the independent variables; see help mfp.

3. The new nlcom command computes point estimates, standard errors, t and Z statistics, p-values, and confidence intervals for nonlinear combinations of coefficients after any estimation command. Results are displayed in the table format that is commonly used for displaying estimation results. The standard errors are based on the delta method, an approximation appropriate in large samples. See help nlcom.

4. The new predictnl command produces nonlinear predictions after any Stata estimation command, and optionally, can calculate the variance, standard errors, Wald test-statistics, significance levels, and point-wise confidence intervals for these predictions. Unlike testnl and nlcom, the quantities generated by predictnl are allowed to vary over the observations in the data. The standard errors and other inference-related quantities are based on the "delta method", an approximation appropriate in large samples. See help predictnl.

5. The new bootstrap command replaces the old bstrap and bs commands. bootstrap has an improved syntax and allows for stratified sampling. See help bootstrap.

Existing command bsample also now accepts the strata() option, and it has a new weight() option that allows the user to save the sample frequency instead of changing the data in memory. See help bootstrap.

6. The existing bstat command can now construct bias-corrected and accelerated (BCa) confidence intervals. In addition, bstat is now an e-class command, meaning all the post-estimation commands can be used on bootstrap results. See help bootstrap.

7. Existing command jknife now accepts the cluster() option; see help jknife.

8. New command permute estimates p-values for permutation tests based on Monte Carlo simulations. These estimates can be one sided or two sided. See help permute.

9. Existing command sample has new option count that allows samples of the specified number of observations (rather than a percentage) to be drawn. In addition, sample now allows the by varlist: prefix as an alternative to the already existing by(varlist) option; both do the same thing. See help sample.

10. New command simulate replaces simul and provides improved syntax for specifying simulations; see help simulate.

11. Existing command statsby has a new syntax, new options, and now allows time-series operators; see help statsby.

12. The new estimates command provides a new, consistent way to store and refer to estimation results. Post-estimation commands that make comparisons across models, such as lrtest and hausman, previously had their own idiosyncratic ways to store and refer to estimation results. These commands now support a unified way of retrieving estimation results utilizing the new estimates suite.

Under the new scheme, after fitting a model, you can type

. estimates store name

to save the results. At some point later in the session, you can type

. estimates restore name

to get back the estimates. You can redisplay estimates (without restoring them) by typing

. estimates replay name

Other estimation manipulation commands are provided; see help estimates.

a. Existing command lrtest has been modified to have syntax

lrtest name name

b. Existing command hausman has been modified to have syntax

hausman name name

c. The new estimates for command can be used in front of any post-estimation command, such as test or predict, to perform the action on the specified set of estimation results, without disturbing the current estimation results. With estimates for, you can type such things as

. estimates for earlierresults: predict expected

See help estimates.

d. The new estimates stats command displays the Akaike Information Criterion (AIC) and Schwarz Information Criterion (BIC) model selection indexes. See help estimates.

13. Existing command lrtest now supports composite models specified by a parenthesized list of model names. In a composite model, it is assumed that the log likelihood and dimension of the full model are obtained as the sum of the log likelihoods and the sum of the dimensions of the constituent models.

lrtest has a new stats option to display statistical information about the unrestricted and restricted models, including the AIC and BIC model selection statistics. See help lrtest.

14. test has improved syntax:

a. You may now type

. test a = b

for expressions a and b, or you may type

. test a == b

The use of == is more consistent with Stata's syntax that treats == as indicating comparison and = as meaning assignment.

b. You may now specify multiple tests on one line:

. test (a == b == c)

. test (a == b) (c == d)

c. test has new option coef, which specifies that the constrained coefficients are to be displayed.

d. test has two new options for use with the test [eq1==eq2] syntax: constant and common. constant specifies that _cons should be included in the list of coefficients to be tested. common specifies that test restrict itself to the coefficient in common between eq1 and eq2.

e. test may now be used after survey estimation.

f. test has a new programmer's option matvlc(matname), which saves the variance-covariance matrix of the linear combination(s).

See help test.

15. testnl now allows typing testnl exp== exp == ... == exp to test whether two or more expressions are equal. Single equal signs may be used: testnl exp= exp = ... = exp.

In addition, testnl has new option iterate(#) for specifying the maximum number of iterations used to find the optimal step size in the calculation of the numerical derivatives of the expressions to be tested. See help testnl.

16. testparm has new option equation() for use after fitting multiple-equation models such as mvreg, mlogit, heckman, etc. It specifies the equation for which the all-zero or all-equal hypothesis is to be tested. See help test.

17. lincom now works after anova and after all survey estimators; see help lincom.

18. bitest, prtest, ttest, and sdtest now allow == to be used wherever = is allowed in their syntax; See help bitest, prtest, ttest, and sdtest.

19. New command suest is a post-estimation command that combines multiple estimation results (parameter vectors and their variance-covariance matrices) into simultaneous results with a single stacked parameter vector and a robust (sandwich) variance-covariance matrix. The estimation results to be combined may be based on different, overlapping, or even the same data. After creating the simultaneous estimation results, one can use test or testnl to obtain Hausman-type tests for cross-model hypotheses. suest supports survey data. See help suest.

20. New command imtest performs the information matrix test for an a regression model. In addition, it provides the Cameron-Trevedi decomposition of the IM-test in tests for heteroskedasticity, skewness, and kurtosis, and White's original heteroskedasticity test. See help imtest.

21. New command szroeter performs Szroeter's test for heteroskedasticity in a regression model; see help szroeter.

22. Existing command hettest now provides option rhs to test for heteroskedasticity in the independent variables. It now also supports multiple comparison testing. See help hettest.

23. Existing command tabulate has output changes, new features, and expanded limits.

a. Three new statistics are available for twoway tabulations: expected, cchi2, and clrchi2. expected reports the expected number in each cell. cchi2 reports the contribution to Pearson's chi-squared. clrchi2 reports the contribution to the likelihood-ratio chi-squared.

b. New options key and nokey force or suppress a key explaining the entries in the table.

c. Twoway tabulations now respect set linesize, meaning you can produce wide tables.

d. Both oneway and twoway tabulations now put commas in the reported frequency counts.

e. tabulate for oneway tabulations has new option sort, which puts the table in descending order of frequency.

f. tabulate has expanded limits:

+------------------------------------------+ | Flavor | 1-way | 2-way | |-------------------+--------+-------------| | Stata/SE | 12,000 | 12,000 x 80 | | Intercooled Stata | 3,000 | 300 x 20 | | Small Stata | 500 | 160 x 20 | +------------------------------------------+

See help tabulate.

24. Existing command tabstat has new options statistics(variance) and statistics(semean) which display the variance and the standard error of the mean. (Also provided is new option varwidth(#), specifying the number of characters used to display variable names.) See help tabstat.

25. Existing command roctab has new option specificity to graph sensitivity versus specificity, instead of the default sensitivity versus (1-specificity); see help roctab.

26. Existing command ologit now has option or to display results as odds ratios (display exponentiated coefficients); see help ologit.

27. New command lowess replaces old command ksm. lowess allows graph twoway's by() option and is much faster than ksm; see help lowess.

28. Existing command kdensity has been rewritten so that it executes faster; see help kdensity.

29. Existing command intreg now applies constraints in the same manner as all other estimation commands, and existing command mlogit now allows constraints with constants; see help intreg and mlogit.

30. New command pca performs principal components analysis, replacing factor, pc; see help pca.

31. Existing command ml maximize and all estimators using ml have a new tolerance option nrtolerance(#) for determining convergence. Convergence is declared when g*inv(H)*g' < nrtolerance(#), where g represents the gradient vector and H the Hessian matrix; see help maximize.

32. Existing command mfx will now use pweights or iweights when calculating the means or medians for the atlist following an estimation command that used pweights or iweights. Previously, only fweights and aweights were supported. See help mfx.

33. Existing command adjust now allows the pr option to display predicted probabilities when used after svylogit, svyprobit, xtlogit, and xtprobit. See help adjust.

34. The existing regression diagnostics commands acprplot, cprplot, hettest, lvr2plot, ovtest, rvfplot, and rvpplot have been extended to work after anova. In addition, cprplot and acprplot have new options lowess and mspline that allow putting a lowess curve or median spline through the data. See help regdiag.

35. Existing command ranksum has new option porder that estimates P(x_1>x_2); see help signrank.

36. Existing command poisgof has new option pearson to request the Pearson chi-squared goodness-of-fit statistic; see help poisson.

37. Existing command binreg now respects the init() option; see help binreg.

38. Existing command boxcox now accepts iweights; see help boxcox.

39. Existing commands zip and zinb now accept the maximize_option from() to provide starting values; see help zip.

40. Existing command cnsreg now accepts the noconstant option; see help cnsreg.

41. Existing command hotel has been renamed hotelling; hotel is now an abbreviation for hotelling; see help hotelling.

42. The score() option is now unified across all estimation commands. You must specify the correct number of score variables, and, in multiple-equation estimators, you may specify stub* to mean create new variables named stub1, stub2, ...

Estimation commands now save in e(scorevars) the names of the score variables if score() was specified.

43. Existing command summarize without the detail option now allows iweights; see help summarize.

44. Existing commands ci and summarize have new option separator(#) that specifies how frequently separation lines should be inserted into the output; see help ci and summarize.

45. Existing command impute has three new options, regsample, all, and copyrest that control the sample used for forming the imputation and how out-of-sample values are treated; see help impute.

46. Existing command collapse now takes time-series operators; see help collapse.

What's new in data management

1. New command odbc allows Stata for Windows to act as an ODBC client, meaning you can fetch data directly from ODBC sources; see help odbc.

2. Existing command generate has new, more convenient syntax. Now you can type

. generate a = 2 + 3

or

. generate b = "this" + "that"

without specifying whether new variable b is numeric or string of a particular length. If you wish, you can also type

. generate str b = "this" + "that"

which asserts that b is a string but leaves it to generate to determine the length of the string. This is useful in programming situations because it helps to prevent bugs. Of course, you can continue to type

. generate double a = _pi/2

and

. generate str8 b = "this" + "that"

See help generate.

3. Existing command list has been completely redone. Not only is output far more readable -- and even pretty -- but programmers will want to use list to format tables. See help list.

4. Existing command merge has been improved:

a. New options unique, uniqmaster, and uniqusing ensure that the merge goes as you intend. These options amount to assertions that, if false, cause merge to stop. unique specifies that there should not be repeated observations within match variables, and that if you say "merge id using myfile", there should be one observation per id value in the master data (the data in memory) and one observation per id in the using data. If observations are not unique, merge will complain.

Options uniqmaster and uniqusing make the same claim for one or the other half of the merge; uniq is equivalent to specifying uniqmaster and uniqusing.

b. merge no longer has a limit on the number of match (key) variables.

c. merge has new option keep(varlist) that specifies the variables to be kept from the using data.

See help merge.

5. Existing command append has new option keep(varlist) that specifies the variables to be kept from the using data; see help append.

6. New command tsappend appends observations in a time-series context. tsappend uses the information set by tsset, automatically fills in the time variable, and fills in the panel variable if the panel variable was set. See help tsappend.

7. Existing command describe using will now allow you to specify a varlist, so you can check whether a variable exists in a dataset before merging or appending. Programmers will be interested in the new varlist option, which will leave in r() the names of the variables in the dataset. See help describe.

8. New command isid verifies that a variable or set of variables uniquely identify the observations and so are suitable for use with merge; see help isid.

9. Existing command codebook has new option problems to report potential problems in the data; see help codebook.

10. New command labelbook is like codebook, but for value labels. In addition to providing documentation, the output includes a list of potential problems.

New command numlabel prefixes numerical values onto value labels and removes them. For example, the mapping 2 --> "Catholic" becomes "2. Catholic" and vice versa.

See help labelbook and numlabel.

11. New command duplicates reports on, gives examples of, lists, browses, tags, and/or drops duplicate observations; see help duplicates.

12. Existing command recode has three new features:

a. recode now allows a varlist rather than a varname, so several variables can be recoded at once.

b. recode has new option generate() to specify that the transformed variables be stored under different names than the originals.

c. recode has new option prefix(), an alternative to generate, to specify that the transformed variables are to be given their original names, but with a prefix.

See help recode.

13. Existing command sort has new option stable that says, within equal values of the sort keys, the observations are to appear in the same order as they did originally. See help sort.

14. New command webuse loads the specified dataset, obtaining it over the web. By default, datasets are obtained from http://www.stata-press.com/data/r8/, but you can reset that. See help webuse.

New command sysuse loads the specified dataset that was shipped with Stata, plus any other datasets stored along the ado-path; see help sysuse.

15. Existing command insheet has a new delimiter(char) option that allows you to specify an arbitrary character as the value separator; see help insheet.

16. Existing commands infile and infix no longer treat ^Z as the end of a file; see help infile1, infile2 and infix.

17. Existing command save has features:

a. New option orphans specifies that all value labels, including those not attached to any variables, are to be saved in the file.

b. New option emptyok specifies that the dataset is to be saved even if it contains no variables and no observations.

c. Existing option old is removed. To save datasets in Stata 7 format, use the new saveold command; see help saveold.

See help save. By the way, Stata 8 now has a single .dta dataset format used by both Stata/SE and Intercooled Stata, meaning that sharing data with colleagues is easy.

18. Existing command outfile has new features:

a. New options rjs and fjs specify how strings are to be aligned in the output file. The default is left alignment. Option rjs specifies right alignment. Option fjs specifies alignment as specified by the variables' formats.

b. New option runtogether is for use by programmers; it specifies that all string variables be run together without extra spaces in between or quotes.

See help outfile.

19. You may attach value labels to the new extended missing values (.a, .b, ..., .z); see help label.

20. As a consequence of the 26 new missing value codes, the maximum value that can be stored in a byte, int, and long is reduced to 100, 32,740, and 2,147,483,620; see help datatypes.

21. New command split splits the contents of a string variable into one or more parts and is useful for separating words into multiple variables; see help split.

22. In the way of minor improvements are

a. Existing command egen now allows longer numlists in the values() option for the eqany() and neqany() functions; see help egen.

b. Existing command destring now allows an abbreviated newvarlist in the generate() option; see help destring.

c. Existing commands icd9 and icd9p have been updated to use the V18 and V19 codes; V16, V18, and V19 codes have been merged so that icd9 and icd9p work equally well with old and new datasets; see help icd9.

d. Existing command egen mtr() has been updated to include the marginal tax rates for the years 2000 and 2001; see help egen.

e. Existing command mvdecode's mv() option now allows a numlist; see help mvencode.

f. Existing command mvencode has a new, more versatile syntax to accommodate extended missing values; see help mvencode.

g. Existing command xpose has three new options: format, format(%fmt), and promote. The format option finds the largest numeric display format in the pretransposed data and applies it to the transposed data. The format(%fmt) option sets the transposed data to the specified format. The promote option causes the transposed data to have the most compact numeric data type that preserves the original data accuracy. See help xpose.

h. Existing command notes now allows the individual notes to include SMCL directives; see help notes.

i. Existing command mkmat has new nomissing option that causes observations with missing values to be excluded (because matrices can now contain missing values). mkmat has also been made faster. See help mkmat.

j. Existing command ds has three new options: alpha, varwidth(#), and skip(#). alpha sorts the variables in alphabetic order. varwidth(#) specifies the display width of the variable names. skip(#) specifies the number of spaces between variables. See help describe.

k. Existing commands label dir now returns the names of the defined value labels in r(names) and label list now returns the minimum and maximum of the mapped values in r(min) and r(max); see help label.

What's new in expressions and functions

1. First, a warning: Do not type

. generate newvar = ... if oldvar != .

. replace oldvar = ... if oldvar != .

. list ... if var != .

Type

. generate newvar = ... if oldvar < .

. replace oldvar = ... if oldvar < .

. list ... if var < .

or type

. generate newvar = ... if !mi(oldvar)

. replace oldvar = ... if !mi(oldvar)

. list ... if !mi(var)

Stata has new missing values and the ordering is all numbers < . < .a < .b < ... < .z. If you do not use the new missing values, then your old habits will work, but better to be safe.

It is a hot topic of debate at StataCorp whether varname<. or !mi(varname) is the preferred way of excluding missing values, and therefore both constructs are deemed to be equally stylish; use whichever appeals to you.

New function mi() is a synonym for existing function missing(); it returns 1 (true) if missing and false otherwise. See help progfun.

2. By the same token, do not type

. list ... if var == .

To list observations with missing values of {\it var}, type

. list ... if var >= .

or type

. list ... if mi(var)

3. Matrices can now contain missing values, both the standard one (.) and the extended ones (.a, .b, ..., .z).

4. The following new density functions are provided:

a. tden(n,t), the density of Student's t distribution.

b. Fden(n_1,n_2,F), the density of the F distribution.

c. nFden(n_1,n_2,lambda,F), the noncentral F density.

d. betaden(a,b,x), the 2-parameter Beta density.

e. nbetaden(a,b,g,x), the noncentral Beta density.

f. gammaden(a,b,g,x), the 3-parameter Gamma density.

See help probfun.

5. The following new cumulative density functions are provided:

a. nFtail(n_1,n_2,lambda,f), the upper-tail of the noncentral F.

b. nibeta(a,b,lambda,x), the cumulative noncentral ibeta probability.

See help probfun.

6. The following new inverse cumulative density functions are provided:

a. invnFtail(n_1,n_2,lambda,p), the noncentral F corresponding to upper-tail p.

b. invibeta(a,b,p), the incomplete beta value corresponding to p.

c. invnibeta(a,b,lambda,p), the noncentral beta value corresponding to p.

In addition, existing function invbinomial(n,k,p) has improved accuracy. See help probfun.

7. A suite of new functions provides partial derivatives of the cumulative gamma distribution. The following new functions are provided:

a. dgammapda(a,x), partial derivative of gammap(a,x) with respect to a.

b. dgammapdx(a,x), partial derivative of gammap(a,x) with respect to x.

c. dgammapdada(a,x), 2nd partial derivative of gammap(a,x) with respect to a.

d. dgammapdxdx(a,x), 2nd partial derivative of gammap(a,x) with respect to x.

e. dgammapdadx(a,x), 2nd partial derivative of gammap(a,x) with respect to a and x.

See help probfun.

8. All density and distribution functions have been extended to return nonmissing values over the entire real line; see help probfun.

9. The following new string functions are provided:

a. word(s,n) returns the nth word in s.

b. wordcount(s) returns the number of words in s.

c. char(n) returns the character corresponding to ASCII code n.

d. plural(n,s_1) returns the plural of s_1 if n does not equal 1 or -1, and otherwise returns s_1.

e. plural(n,s_1,s_2) returns the plural of s_1 if n does not equal 1 or -1, forming the plural by adding or removing suffix s_2.

f. proper(s) capitalizes the first letter of a string and any other letters immediately following characters that are not letters; remaining letters are converted to lowercase.

See help strfun.

10. The following new mathematical functions are provided:

a. logit(x), the log of the odds ratio.

b. invlogit(x), the inverse logit.

c. cloglog(x), the complementary log-log.

d. invcloglog(x), the inverse of the complementary log-log.

e. tanh(x), the hyperbolic tangent.

f. atanh(x), the inverse-hyperbolic tangent of x.

g. floor(x), the integer n such that n <= x < n+1.

h. ceil(x), the integer n such that n < x <= n+1.

In addition, the following existing mathematical functions have been modified:

i. round(x,y) now allows the second argument be optional and defaults it to 1, so round(x) returns x rounded to the closest integer.

j. lngamma(x) and gammap(a,x) now have improved accuracy.

See help mathfun.

11. Existing function uniform() will now allow you to capture and reset its seed. The seed value, in encrypted form, is now shown by query. You can store its value by typing

local seed = c(seed)

Later, you can reset it by typing

. set seed `seed'

See help seed and help random.

12. The following new matrix functions are provided:

a. issym(M) returns 1 if matrix M is symmetric and returns 0 otherwise; issym() may be used in any context.

b. matmissing(M) returns 1 if any elements of M are missing and returns 0 otherwise; matmissing() may be used in any context.

c. vec(M) returns the column vector formed by listing the elements of M, starting with the first column and proceeding column by column.

d. hadamard(M,N) returns a matrix whose i, j element is M[i,j] * N[i,j].

e. matuniform(r,c) returns the r by c matrix containing uniformly distributed pseudo-random numbers on the interval [0,1).

See help matfcns.

In addition, the new command matrix eigenvalues returns the complex eigenvalues of an n by n nonsymmetric matrix; see help mateig.

13. The following new programming functions have been added:

a. clip(x,a,b) returns x if a <= x <= b, a if x <= a, b if x >= b, and missing if $x$ is missing.

b. chop(x,epsilon) returns round(x) if |x - round(x)| < epsilon, otherwise returns x.

c. irecode(z,x_1,x_2, ... ,x_n) returns the index of the range in which z falls.

d. maxbyte(), maxint(), maxlong(), maxfloat(), and maxdouble() return the maximum value allowed by the storage type.

e. minbyte(), minint(), minlong(), minfloat(), and mindouble() return the minimum value allowed by the storage type.

f. epsfloat() and epsdouble() return the precision associated with the storage type.

g. byteorder() returns 1 if the computer stores numbers in most-significant-byte-first format and 0 if in least-significant-byte-first format.

The following programming functions have been modified or extended:

h. missing(x) now optionally allows multiple arguments so that it becomes missing(x_1,x_2, ... ,x_n). The extended function returns 1 (true) if any of the x_i are missing and returns 0 (false) otherwise.

i. cond(x,a,b) now optionally allows a fourth argument so that it becomes cond(x,a,b,c). c is returned if x evaluates to missing.

See help progfun.

What's new in display formats

1. The %g format has been modified: %#.0g still means the same as previously, but %#.#g has a new meaning. For instance, %9.5g means to show approximately 5 significant digits. We say approximately because, given the number 123,456, %9.5g will show 123456 rather than 1.2346e+05, as would strictly be required if only five digits are to be shown. Other than that, it does what you would expect, and we think, in all cases, does what you want.

2. %[-]0#.#f formats, note the leading 0, now specify that leading zeros are to be included in the result. 1.2 in %09.2f format is 000001.20.

3. Stata has a new %21x hexadecimal format that will mainly be of interest to numerical analysts. In %21x, 123,456 looks like +1.e240000000000X+010, which you read as the hexadecimal number 1.e24 multiplied by 2^10. The period in 1.e24 is the base-16 point. The beauty of this format is that it reveals numbers exactly as the binary computer thinks of it. For instance, the new format shows how difficult numbers like 0.1 are for binary computers: +1.999999999999aX-004.

You can use this hexadecimal way of writing numbers in expressions; Stata will understand, for instance,

. generate xover4 = x / 1.0x+2

but it is unlikely you would want to do that. The notation will even by understood by input, infix, and infile. There is no %21x input format, but wherever a number appears, Stata will understand #.##...#x[+|-]###.

See help format.

What's new in programming

Lots of programming improvements have been made; see What's new in [P] intro. Here we will just touch on a few highlights.

1. The two big features are the ability to program dialog boxes and the addition of class programming; see help dialogs and class. Stata's new GUI and new graphics have been programmed using these new features.

2. The new c-class collects where settings are found. Type creturn list and all will become clear. Recorded in c(settingname) are all the system settings, so no longer do you have to wonder whether the setting is in $S_something, obtained as a result of an extended macro function, or found somewhere else. See help creturn.

3. Program debugging is now easier thanks to the new trace facilities.

a. Trace output now shows the line with macros expanded as well as unexpanded. This makes spotting errors easier.

b. Separators are drawn and output indented when one program calls another, making it easier to see where you are.

c. set trace is now pushed-and-popped, so the original value will be restored when a program ends.

d. The new command set tracedepth allows you to specify how deeply calls to subroutines should be traced, so you can eliminate unwanted output.

See help trace.

4. One change will bite you: With if exp, while exp, forvalues, and all the other commands that take a brace, no longer can the open brace and close brace be on the same line as the command. You may not code

if (exp) { ... }

You must instead code

if (exp) { ... }

In the case of if, you may omit the braces altogether:

if (exp) ...

Under version control, Stata continues to tolerate the old, all on one line syntax, but the new syntax makes Stata considerably faster. See help ifcmd.

5. Existing commands postfile, post, and postclose will now save string variables; see help postfile.

6. Do-files and ado-files now allow // comments and /// continuation lines. // on a line says that from here to the end of the line is a comment. /// does the same, but also says that the next line is to be joined with the current line (and not treated as a comment). See help comments.

7. Existing command which will now not only locate .ado files, but other system files as well. You can type, for instance, which anova.hlp to discover the location of the help file for anova. See help which.

New command findfile will look for any file along the adopath; see help findfile.

8. The sysdir directory STBPLUS is now called PLUS; see help sysdir.

9. net .pkg files have new features:

a. F filename is a variation on f filename that specifies the file is to be installed into the system directories, even if it ordinarily would not. This is useful for installing .dta datasets that accompany ado-files.

b. g platformname filename is another variation on f filename. It specifies that the file is to be installed only if the user's computer is of type platformname.

c. G platformname filename is variation on F filename. The file is installed only if the user's computer is of type platformname, and, if it is installed, it is installed in the system directories.

d. h filename asserts that filename must be loaded or else this package cannot be installed.

e. The maximum number of description lines in a .pkg file has been increased from 20 to 100.

See help net and usersite.

There are lots of new programming features, and the ones we have chosen to mention may not be of the most interest to you. Do see What's new in [P] intro.

What's new in the user interface

1. The GUI, of course, but we have already mentioned that; see Stata's interface in Chapter 3 of the Getting Started with Stata manual.

2. Stata now has tab-name completion. When typing a command, type the first few letters of a variable name and press tab.

3. Existing commands set and query have been redone. set now has a permanently option that makes the setting permanent across sessions, alleviating the need for creating profile.do files. query has a new output format. See help set and query.

4. There are lots of new set parameters. Do not even try to dig them out of the manual. Instead, type query. The new query output shows you where you can find out about each and what values you can set.

5. Almost all windows now have contextual menus; right-click when you are in the window to try them.

6. Under Windows and Mac, the following improvements have been made:

a. If an http proxy is needed, Stata will attempt to get the proper settings from the operating system; see help netio.

b. You are no longer limited to a maximum of 10 nested do-files. The limit is now 64, the same as Stata for Unix.

7. Under Windows, the following improvements have been made:

a. Shortcuts for .smcl files have been added. By default, double-clicking on the shortcut will open the file in the Viewer, and right-clicking on the shortcut and choosing Edit will open the file in the Do-file Editor.

b. Multiple instances of Stata for Windows running at the same time are now clearly marked in their title bar with an instance number.

c. You can now set the maximum number of lines recorded in the Review window using set reventries; see help reventries.

8. Under Mac, the following improvements have been made:

a. Stata is now a native Mach-O application. It may be launched from a terminal with command line options in addition to the usual double-clicking on Stata from the Finder.

b. Stata can now change the amount of memory allocated on the fly just as Stata can on other operating systems; see help memory.

c. Stata can now pass commands to the operating system for execution; see help shell.

d. The filename separator is now forward slash (/) rather than colon (:) in keeping with changes made by Apple. For backward compatibility, Stata still recognizes a colon (:) as a filename separator.

e. You can now open more than one file simultaneously in the Do-file Editor.

f. Stata honors and sets file permissions when creating files.

g. Stata now uses /tmp for its temporary files.

h. You can now select all the contents of the Results or Viewer windows by selecting Select All from the Edit menu.

i. There is a new menu item, Bring All to Front, in the Window menu that brings all Stata windows to the front.

9. Stata for Unix now looks for the environment variable STATATMP in addition to the environment variable TMPDIR for the location of the directory where temporary files are stored. STATATMP takes precedence over TMPDIR.

What's more

We have not listed all the changes, but we have listed the important ones. The remaining changes -- a list of about equal length as the one above -- are all implications of what has been listed.

What is important to know is that Stata is continually being updated and those updates are available for free over the Internet. All you have to do is type

. update query

and follow the instructions. (Or just click here to update).

We hope you enjoy Stata 8.

--- previous updates ----------------------------------------------------------

See whatsnew7.

-------------------------------------------------------------------------------


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index