## Stata 15 help for whatsnew7to8


What's new in release 8.0 (compared with release 7)

This file lists the changes corresponding to the creation of Stata
release 8.0:

+---------------------------------------------------------------+
| help file        contents                     years           |
|---------------------------------------------------------------|
| whatsnew         Stata 15.0 and 15.1          2017 to present |
| whatsnew14to15   Stata 15.0 new release       2017            |
| whatsnew14       Stata 14.0, 14.1, and 14.2   2015 to 2017    |
| whatsnew13to14   Stata 14.0 new release       2015            |
| whatsnew13       Stata 13.0 and 13.1          2013 to 2015    |
| whatsnew12to13   Stata 13.0 new release       2013            |
| whatsnew12       Stata 12.0 and 12.1          2011 to 2013    |
| whatsnew11to12   Stata 12.0 new release       2011            |
| whatsnew11       Stata 11.0, 11.1, and 11.2   2009 to 2011    |
| whatsnew10to11   Stata 11.0 new release       2009            |
| whatsnew10       Stata 10.0 and 10.1          2007 to 2009    |
| whatsnew9to10    Stata 10.0 new release       2007            |
| whatsnew9        Stata  9.0, 9.1, and 9.2     2005 to 2007    |
| whatsnew8to9     Stata  9.0 new release       2005            |
| whatsnew8        Stata  8.0, 8.1, and 8.2     2003 to 2005    |
| this file        Stata  8.0 new release       2003            |
| whatsnew7        Stata  7.0                   2001 to 2002    |
| whatsnew6to7     Stata  7.0 new release       2000            |
| whatsnew6        Stata  6.0                   1999 to 2000    |
+---------------------------------------------------------------+

Most recent changes are listed first.

--- more recent updates -------------------------------------------------------

See whatsnew8.

--- Stata 8.0 release 02jan2003 -----------------------------------------------

As always, Stata 8.0 is 100% compatible with the previous release of
Stata, but as always we remind programmers that it is vitally important
that you put version 7.0 at the top of your old do-files and ado-files if
they are to work; see help version. You were supposed to do that when you
wrote them but, if you did not, go back and do it now. We have made a lot
of changes (improvements) to Stata.

In addition, Stata's dataset format has changed because of the new longer
data storage types and the fact that Stata now has multiple
representations for missing values. You will not care because Stata
automatically reads old-format datasets, but if you need to send a
dataset to someone still using Stata 7, remember to use the saveold
command; see help saveold.

The features added to Stata 8.0 are listed under the following headings.

What's big
Graphics
GUI
What's useful
What's convenient
What was needed
What's faster
What's new in time-series analysis
What's new in cross-sectional time-series analysis
What's new in survival analysis
What's new in survey analysis
What's new in cluster analysis
What's new in statistics useful in all fields
What's new in data management
What's new in expressions and functions
What's new in display formats
What's new in programming
What's new in the user interface
What's more

What's big

The big news is the new GUI and the new Graphics.  There is no putting
them in an order.

Graphics

You can create graphs that look like this

(click to run)

or this

(click to run)

See A quick tour in help graph_intro.  Everything you need to know is
online.

So what's new in Stata graphics?  Everything.  There is not one little
bit that is not new, even if it seems familiar.

Before you panic, let us tell you that all the old graphics are still in
Stata.  If you type

. graph7 ...

or

. gr7 ...

you will be back to using the old graph command; see help graph7.
Moreover, the old graph command is still invoked under version control;
see help version.  If you set your version to 7.0 or earlier, graph does
not mean what is defined in help graph; it means what it used to mean,
which means that old do-files and ado-files continue to work.

One new feature requires some adjustment.  What used to be called symbols
are now called markers, and marker symbols are the shapes of the markers.
Thus, you no longer specify the symbol() or s() option, you specify the
msymbol() or ms() option.  In addition, the old s(.) for specifying the
dot symbol is now ms(p) (p stands for point).  ms(.) means to use the
default.

All existing statistical commands that produce graphs have been updated
to take advance of the new graphics.

GUI

GUI stands for Graphical User Interface, and to try it, you do not need
to read a thing.  Pull down Data, Graphics, or Statistics, find what you
are looking for, and click.

Fill in the dialog box and click to submit.  Do not ignore tabs at the
top -- there are very useful things hidden under them.

If you know the command you want, you can skip the menus and type db
followed by the command name.  For instance you can jump directly to the
stcox dialog box by typing db stcox (or click here).  See help db.

What's useful

Stata 8 has so many features that finding what you are looking for can be
a challenge.  We have addressed that:

1.  Pull down Help and select Contents.  You will be presented with the
categories Basics, Data management, Statistics, Graphics, and
Programming.  Click on one of them -- say, Statistics -- and you
will be presented with another set of categories:  Summary
statistics and tests, Tables, Estimation, Multivariate analysis,
Resampling and simulation, Statistical hand calculations, and
Special topics.  Click on one of those and, well, you get the idea.
With the new help contents, it never takes long to find what you
need.

2.  Help files now have hyperlinks in the header for launching the
dialog associated with the command.  So, there are three ways to
launch a dialog box:  (1) use the menus (pull down Data, Graphics,
or Statistics); (2) use the new db command (see help db); or (3)
pick the command from the online help.

3.  When you do need to search, findit is the key.  findit searches
everywhere:  Stata itself, the Stata website, the FAQs, the Stata
Journal, and even user-written programs available on the web.  An
earlier version of findit was made available as an update to Stata
7, but the new version is better.  You can also access findit by
pulling down Help and selecting Search.  If you do that, be sure to
click Search all in the dialog box.  See help search.

4.  The new ssc command lists and installs user-written packages from
the Statistical Software Components (SSC) archive, also known as
the Boston College Archive, located at http://www.repec.org.  See
help ssc.

5.  The new net sj command makes loading files from the new Stata
Journal easier; see help net.

What's convenient

The existing set command has a new permanently option that allows you to
make the setting permanent.  This does away with the necessity of having
a profile.do file for most users.

What was needed

Stata now has multiple missing values!  In addition to the previously
existing ., there is now .a, .b, ..., .z, and you can attach value labels
to the new missing codes!

One thing to watch out for:  Do not type

. stata_command ... if x != .

. stata_command ... if x < .

You need remember this only if you use the new missing values, but better
to have good habits.  The way things now work,

all numbers < . < .a < .b < ... < .z

So, if you wanted to list all observations for which x is missing, you
would type

. list if x >= .

See help missing.

What's faster

Stata 8 executes programming commands in half the time of Stata 7, on
average.  This results in commands implemented as ado-files running about
17 to 43% faster.

1.  This speed-up is due to a new, faster memory manager that reduces
the time needed to find, access, and store results.  Thus, the
improvement does not change much the time to run built-in, heavily
computational commands.  regress, for instance, runs only 1.43%
faster.  Nevertheless, the effect can be marked on other commands.
poisson runs up to 31% faster, and heckman runs up to 43% faster.
The larger the dataset, the less will be the improvement:  heckman
runs 17% faster on 4,000 observations.

2.  That statistical commands run faster is a happy side effect.  The
big advantage of the speed-up is that it allows some problems to be
approached using ado-files that previously would have required
internal code, such as Stata's new graphics, which is an ado-file
implementation!  Some programming commands run up to 400% faster.
Implementing features as ado-files is part of the effort to keep
Stata open and extendable by users.

What's new in time-series analysis

1.  Stata now can fit vector autoregression (VAR) and structural vector
autoregression (SVAR) models.  New commands var, varbasic, and svar
perform the estimation; see help varintro.

a.  A suite of varirf commands estimate, tabulate, and graph
impulse-response functions, cumulative impulse-response
functions, orthogonalized impulse-response functions,
structural impulse-response functions, and their confidence
intervals, along with forecast-error variance decompositions
and structural forecast-error variance decompositions; see help
varirf.  This suite allows graphical comparisons of IRFs and
variance decompositions across models and orderings.

b.  varfcast produces dynamic forecasts from a previously fitted
var or svar model; see help varfcast.

c.  There is also a full suite of diagnostic and testing tools
including

i.  vargranger, that performs Granger causality tests; see
help vargranger.

ii.  varlmar, that performs a Lagrangian multiplier (LM) test
for residual autocorrelation; see help varlmar.

iii.  varnorm, that performs a series of tests for normality of
the disturbances; see help varnorm.

iv.  varsoc, that reports a series of lag order selection
statistics; see help varsoc.

v.  varstable, that checks the eigenvalue stability
condition; see help varstable.

vi.  varwle, that performs a Wald test that all the endogenous
variables of a given lag are zero, both for each equation
separately and for all equations jointly; see help
varwle.

2.  The new tssmooth command smooths and predicts univariate time
series using weighted or unweighted moving average, single
exponential smoothing, double exponential smoothing, Holt-Winters
nonseasonal smoothing, Holt-Winters seasonal smoothing, or
nonlinear smoothing.  See help tssmooth.

3.  The new tsappend command appends observations to a time-series
dataset, automatically filling in the time variable and the panel
variable, if set, by using the information contained in tsset.  See
help tsappend.

4.  The new archlm command computes a Lagrange multiplier test for
autoregressive conditional heteroskedasticity (ARCH) effects in the
residuals after regress; see help archlm.

5.  The new bgodfrey command computes the Breusch-Godfrey Lagrange
multiplier (LM) test for serial correlation in the disturbances
after regress; see help bgodfrey.

6.  The new durbina command computes the Durbin (1970) alternative
statistic to test for serial correlation in the disturbances after
regress when some of the regressors are not strictly exogenous; see
help durbina.

7.  The new dfgls command performs the modified Dickey-Fuller t test
for a unit root (proposed by Elliott, Rothenberg, and Stock (1996))
using models with 1 to maxlags lags of the first differenced
variable in an augmented Dickey-Fuller regression; see help dfgls.

8.  The existing arima command may now be used with the by prefix
command, and it now allows prediction in loops over panels; see
help arima.

9.  The existing newey command now allows (and requires) that you tsset
your data; see help newey.

What's new in cross-sectional time-series analysis

1.  The new xthtaylor command fits panel-data random-effects models
using the Hausman-Taylor and the Amemiya-MaCurdy
instrumental-variables estimators; see help xthtaylor.

2.  The new xtfrontier command fits stochastic production or cost
frontier models for panel data allowing two different
parameterizations for the inefficiency term: a time-invariant model
and the Battese-Coelli (1992) parameterization of time effects; see
help xtfrontier.

3.  The existing xtabond command now allows endogenous regressors; see
help xtabond.

4.  The existing xtivreg command will now optionally report first stage
results of Baltagi's EC2SLS random-effects estimator; see help
xtivreg.

5.  The existing xttobit and xtintreg commands have new predict
options:

a.  pr0(#_a,#_b) produces the probability of the dependent variable
being uncensored P(#_a< y < #_b).

b.  e0(#_a,#_b) produces the corresponding expected value E(y | #_a
< y < #_b).

c.  ystar(#_a,#_b) produces the expected value of the dependent
variable truncated at the censoring point(s), E(y^*), where y^*
= max(#_a, min(y,#_b)).

See help xttobit and xtintreg.

6.  Existing commands xtgee and xtlogit have a new nodisplay option
that suppresses the header and table of coefficients; {cmd:xtregar,
fe} now allows aweights and fweights; and xtpcse now has no
restrictions on how aweights are applied.  See help xtgee, xtlogit,
and xtpcse.

7.  Two commands have been renamed:  xtpois is now called xtpoisson and
xtclog is now xtcloglog.  The old names continue to work.  See help
xtpoisson and xtcloglog.

What's new in survival analysis

1.  Existing command stcox has an important new feature and some minor
improvements:

a.  stcox will now fit models with gamma-distributed frailty.  In
this model, frailty is assumed to be shared across groups of
observations.  Previously, if one wanted to analyze
multivariate survival data using the Cox model, one would fit a
standard model and account for the correlation within groups by
adjusting the standard errors for clustering.  Now, one may
directly model the correlation by assuming a latent
gamma-distributed random effect or frailty; observations within
group are correlated because they share the same frailty.
Estimation is via penalized likelihood.  An estimate of the
frailty variance is available and group-level frailty estimates
can be retrieved.

b.  fracpoly, sw, and linktest now work after stcox.

See help stcox.

2.  Existing command streg has an important new feature and some minor
improvements:

a.  streg has new option shared(varname) for fitting parametric
shared frailty models, analogous to random effects models for
panel data.  streg could, and still can, fit frailty models
where the frailties are assumed to be randomly distributed at
the observation level.

b.  fracpoly, sw, and linktest now work after streg.

c.  streg has four other new options: noconstant, offset(),

See help streg.

3.  predict after streg, frailty() has two new options:

a.  alpha1 generates predictions conditional on a frailty equal to
1.

b.  unconditional generates predictions that are "averaged" over
the frailty distribution.

These new options may also be used with stcurve.  See help streg.

4.  sts graph and stcurve (after stcox) can now plot estimated hazard
functions, which are calculated as weighted kernel smooths of the
estimated hazard contributions; see help sts.

5.  streg, dist(gamma) is now faster and more accurate.  In addition,
you can now predict mean time after gamma; see help streg.

6.  Old commands ereg, ereghet, llogistic, llogistichet, gamma,
gammahet, weibull, weibullhet, lnormal, lnormalhet, gompertz,
gompertzhet are deprecated (they continue to work) in favor of
streg.  Old command cox is now deprecated (it continues to work) in
favor of stcox.  See help streg and stcox.

What's new in survey analysis

1.  Stata's ml user-programmable likelihood-estimation routine has new
options that automatically handle the production of survey
estimators, including stratification and estimation on a
subpopulation; see help ml.

2.  Four new survey estimation commands are available:

a.  svynbreg for negative-binomial regression; see help svynbreg.

b.  svygnbreg for generalized negative-binomial regression; see
help svygnbreg.

c.  svyheckman for the Heckman selection model; see help
svyheckman.

d.  svyheckprob for probit regression with selection; see help
svyheckprob.

3.  Use of the survey commands has been made more consistent.

a.  svyset has new syntax.  Before it was

svyset thing_to_set [, clear ]

and now it is

svyset [weight] [, strata(varname) psu(varname)
fpc(varname) ]

See help svyset for details.  In addition, you must now svyset
your data prior to using the survey commands; no longer can you
set the data via options to the other survey commands.

b.  Two survey estimation commands have been renamed:  svyreg to
svyregress and svypois to svypoisson; see help svyregress and
svypois.

c.  svyintreg now applies constraints in the same manner as all
other estimation commands; see help svyintreg.

d.  lincom now works after all svy estimators; see help lincom.
(svylc is now deprecated.)

e.  testnl now works after all svy estimators; see help testnl.

f.  testparm now works after all svy estimators; see help test.

g.  The new nlcom and predictnl commands, which form nonlinear
combinations of estimators and generalized predictions, work
after all svy estimators; see help nlcom and predictnl.

4.  Existing command svytab has three new options: cellwidth(),
csepwidth(), and stubwidth(); they specify the widths of table
elements in the output.  See help svytab.

What's new in cluster analysis

1.  The new cluster wardslinkage command provides Ward's linkage
hierarchical clustering and can produce Ward's method, also known
as minimum-variance clustering.  See help clward.

2.  The new cluster waveragelinkage command provides weighted-average
linkage hierarchical clustering to accompany the previously
available average linkage clustering.  See help clwav.

3.  The new cluster centroidlinkage command provides centroid linkage
hierarchical clustering.  This differs from the previously
available cluster averagelinkage in that it combines groups based
on the average of the distances between observations of the two
groups to be combined.  See help clcent.

4.  The new cluster medianlinkage command provides median linkage
hierarchical clustering, also known as Gower's method.  See help
clmedian.

5.  The new cluster stop command provides stopping rules.  Two popular
stopping rules are provided, the Calinski & Harabasz pseudo-F index
(Calinski and Harabasz (1974)) and the Duda & Hart Je(2)/Je(1)
index with associated pseudo T-squared (Duda and Hart (1973)).  See
help clstop.

Additional stopping rules can be added; see help clprog.

6.  Two new dissimilarity measures have been added:  L2squared and
Lpower(#).  L2squared provides squared Euclidean distance.
Lpower(#) provides the Minkowski distance metric with argument #
raised to the # power.  See help cldis.

7.  A list of the variables used in the cluster analysis is now saved
with the cluster analysis structure, which is useful for
programmers; see help clprog.

What's new in statistics useful in all fields

1.  The following new estimators are available:

a.  manova fits multivariate analysis-of-variance (MANOVA) and
multivariate analysis-of-covariance (MANCOVA) models for
balanced and unbalanced designs, including designs with missing
cells; and for factorial, nested, or mixed designs.  See help
manova.  (manovatest provides multivariate tests involving
terms from the most recently fitted manova; see help
manovatest.)

b.  rologit fits the rank-order logit model, also known as the
exploded logit model.  This model is a generalized McFadden's
choice model as fitted by clogit.  In the choice model, only
the alternative that maximizes utility is observed.  rologit
fits the corresponding model in which the preference ranking of
the alternatives is observed, not just the alternative that is
ranked first.  rologit supports incomplete rankings and ties
("indifference").  See help rologit.

c.  frontier fits stochastic frontier models with technical or cost
inefficiency effects.  frontier can fit models in which the
inefficiency error component is assumed to be from one of the
three distributions: half-normal, exponential, or
truncated-normal.  In addition, when the inefficiency term is
assumed to be either half-normal or exponential, frontier can
fit models in which the error components are heteroskedastic,
conditional on a set of covariates.  frontier can also fit
models in which the mean of the inefficiency term is modeled as
a linear function of a set of covariates.  See help frontier.

These new estimators are in addition to the new estimators listed
in previous sections.

2.  New command mfp selects the fractional polynomial model that best
predicts the dependent variable from the independent variables; see
help mfp.

3.  The new nlcom command computes point estimates, standard errors, t
and Z statistics, p-values, and confidence intervals for nonlinear
combinations of coefficients after any estimation command.  Results
are displayed in the table format that is commonly used for
displaying estimation results.  The standard errors are based on
the delta method, an approximation appropriate in large samples.
See help nlcom.

4.  The new predictnl command produces nonlinear predictions after any
Stata estimation command, and optionally, can calculate the
variance, standard errors, Wald test-statistics, significance
levels, and point-wise confidence intervals for these predictions.
Unlike testnl and nlcom, the quantities generated by predictnl are
allowed to vary over the observations in the data.  The standard
errors and other inference-related quantities are based on the
"delta method", an approximation appropriate in large samples.  See
help predictnl.

5.  The new bootstrap command replaces the old bstrap and bs commands.
bootstrap has an improved syntax and allows for stratified
sampling.  See help bootstrap.

Existing command bsample also now accepts the strata() option, and
it has a new weight() option that allows the user to save the
sample frequency instead of changing the data in memory.  See help
bootstrap.

6.  The existing bstat command can now construct bias-corrected and
accelerated (BCa) confidence intervals.  In addition, bstat is now
an e-class command, meaning all the post-estimation commands can be
used on bootstrap results.  See help bootstrap.

7.  Existing command jknife now accepts the cluster() option; see help
jknife.

8.  New command permute estimates p-values for permutation tests based
on Monte Carlo simulations.  These estimates can be one sided or
two sided.  See help permute.

9.  Existing command sample has new option count that allows samples of
the specified number of observations (rather than a percentage) to
be drawn.  In addition, sample now allows the by varlist: prefix as
an alternative to the already existing by(varlist) option; both do
the same thing.  See help sample.

10.  New command simulate replaces simul and provides improved syntax
for specifying simulations; see help simulate.

11.  Existing command statsby has a new syntax, new options, and now
allows time-series operators; see help statsby.

12.  The new estimates command provides a new, consistent way to store
and refer to estimation results.  Post-estimation commands that
make comparisons across models, such as lrtest and hausman,
previously had their own idiosyncratic ways to store and refer to
estimation results.  These commands now support a unified way of
retrieving estimation results utilizing the new estimates suite.

Under the new scheme, after fitting a model, you can type

. estimates store name

to save the results.  At some point later in the session, you can
type

. estimates restore name

to get back the estimates.  You can redisplay estimates (without
restoring them) by typing

. estimates replay name

Other estimation manipulation commands are provided; see help
estimates.

a.  Existing command lrtest has been modified to have syntax

lrtest name name

b.  Existing command hausman has been modified to have syntax

hausman name name

c.  The new estimates for command can be used in front of any
post-estimation command, such as test or predict, to perform
the action on the specified set of estimation results, without
disturbing the current estimation results.  With estimates for,
you can type such things as

. estimates for earlierresults: predict expected

See help estimates.

d.  The new estimates stats command displays the Akaike Information
Criterion (AIC) and Schwarz Information Criterion (BIC) model
selection indexes.  See help estimates.

13.  Existing command lrtest now supports composite models specified by
a parenthesized list of model names.  In a composite model, it is
assumed that the log likelihood and dimension of the full model are
obtained as the sum of the log likelihoods and the sum of the
dimensions of the constituent models.

lrtest has a new stats option to display statistical information
about the unrestricted and restricted models, including the AIC and
BIC model selection statistics.  See help lrtest.

14.  test has improved syntax:

a.  You may now type

. test a = b

for expressions a and b, or you may type

. test a == b

The use of == is more consistent with Stata's syntax that
treats == as indicating comparison and = as meaning assignment.

b.  You may now specify multiple tests on one line:

. test (a == b == c)

. test (a == b) (c == d)

c.  test has new option coef, which specifies that the constrained
coefficients are to be displayed.

d.  test has two new options for use with the test [eq1==eq2]
syntax:  constant and common.  constant specifies that _cons
should be included in the list of coefficients to be tested.
common specifies that test restrict itself to the coefficient
in common between eq1 and eq2.

e.  test may now be used after survey estimation.

f.  test has a new programmer's option matvlc(matname), which saves
the variance-covariance matrix of the linear combination(s).

See help test.

15.  testnl now allows typing testnl exp== exp == ... == exp to test
whether two or more expressions are equal.  Single equal signs may
be used:  testnl exp= exp = ... = exp.

In addition, testnl has new option iterate(#) for specifying the
maximum number of iterations used to find the optimal step size in
the calculation of the numerical derivatives of the expressions to
be tested.  See help testnl.

16.  testparm has new option equation() for use after fitting
multiple-equation models such as mvreg, mlogit, heckman, etc.  It
specifies the equation for which the all-zero or all-equal
hypothesis is to be tested.  See help test.

17.  lincom now works after anova and after all survey estimators; see
help lincom.

18.  bitest, prtest, ttest, and sdtest now allow == to be used wherever
= is allowed in their syntax; See help bitest, prtest, ttest, and
sdtest.

19.  New command suest is a post-estimation command that combines
multiple estimation results (parameter vectors and their
variance-covariance matrices) into simultaneous results with a
single stacked parameter vector and a robust (sandwich)
variance-covariance matrix. The estimation results to be combined
may be based on different, overlapping, or even the same data.
After creating the simultaneous estimation results, one can use
test or testnl to obtain Hausman-type tests for cross-model
hypotheses.  suest supports survey data.  See help suest.

20.  New command imtest performs the information matrix test for an a
regression model.  In addition, it provides the Cameron-Trevedi
decomposition of the IM-test in tests for heteroskedasticity,
skewness, and kurtosis, and White's original heteroskedasticity
test.  See help imtest.

21.  New command szroeter performs Szroeter's test for
heteroskedasticity in a regression model; see help szroeter.

22.  Existing command hettest now provides option rhs to test for
heteroskedasticity in the independent variables.  It now also
supports multiple comparison testing.  See help hettest.

23.  Existing command tabulate has output changes, new features, and
expanded limits.

a.  Three new statistics are available for twoway tabulations:
expected, cchi2, and clrchi2.  expected reports the expected
number in each cell.  cchi2 reports the contribution to
Pearson's chi-squared.  clrchi2 reports the contribution to the
likelihood-ratio chi-squared.

b.  New options key and nokey force or suppress a key explaining
the entries in the table.

c.  Twoway tabulations now respect set linesize, meaning you can
produce wide tables.

d.  Both oneway and twoway tabulations now put commas in the
reported frequency counts.

e.  tabulate for oneway tabulations has new option sort, which puts
the table in descending order of frequency.

f.  tabulate has expanded limits:

+------------------------------------------+
| Flavor            |  1-way |    2-way    |
|-------------------+--------+-------------|
| Stata/SE          | 12,000 | 12,000 x 80 |
| Intercooled Stata |  3,000 |    300 x 20 |
| Small Stata       |    500 |    160 x 20 |
+------------------------------------------+

See help tabulate.

24.  Existing command tabstat has new options statistics(variance) and
statistics(semean) which display the variance and the standard
error of the mean.  (Also provided is new option varwidth(#),
specifying the number of characters used to display variable
names.) See help tabstat.

25.  Existing command roctab has new option specificity to graph
sensitivity versus specificity, instead of the default sensitivity
versus (1-specificity); see help roctab.

26.  Existing command ologit now has option or to display results as
odds ratios (display exponentiated coefficients); see help ologit.

27.  New command lowess replaces old command ksm.  lowess allows graph
twoway's by() option and is much faster than ksm; see help lowess.

28.  Existing command kdensity has been rewritten so that it executes
faster; see help kdensity.

29.  Existing command intreg now applies constraints in the same manner
as all other estimation commands, and existing command mlogit now
allows constraints with constants; see help intreg and mlogit.

30.  New command pca performs principal components analysis, replacing
factor, pc; see help pca.

31.  Existing command ml maximize and all estimators using ml have a new
tolerance option nrtolerance(#) for determining convergence.
Convergence is declared when g*inv(H)*g' < nrtolerance(#), where g
represents the gradient vector and H the Hessian matrix; see help
maximize.

32.  Existing command mfx will now use pweights or iweights when
calculating the means or medians for the atlist following an
estimation command that used pweights or iweights.  Previously,
only fweights and aweights were supported.  See help mfx.

33.  Existing command adjust now allows the pr option to display
predicted probabilities when used after svylogit, svyprobit,
xtlogit, and xtprobit.  See help adjust.

34.  The existing regression diagnostics commands acprplot, cprplot,
hettest, lvr2plot, ovtest, rvfplot, and rvpplot have been extended
to work after anova.  In addition, cprplot and acprplot have new
options lowess and mspline that allow putting a lowess curve or
median spline through the data.  See help regdiag.

35.  Existing command ranksum has new option porder that estimates
P(x_1>x_2); see help signrank.

36.  Existing command poisgof has new option pearson to request the
Pearson chi-squared goodness-of-fit statistic; see help poisson.

37.  Existing command binreg now respects the init() option; see help
binreg.

38.  Existing command boxcox now accepts iweights; see help boxcox.

39.  Existing commands zip and zinb now accept the maximize_option
from() to provide starting values; see help zip.

40.  Existing command cnsreg now accepts the noconstant option; see help
cnsreg.

41.  Existing command hotel has been renamed hotelling; hotel is now an
abbreviation for hotelling; see help hotelling.

42.  The score() option is now unified across all estimation commands.
You must specify the correct number of score variables, and, in
multiple-equation estimators, you may specify stub* to mean create
new variables named stub1, stub2, ...

Estimation commands now save in e(scorevars) the names of the score
variables if score() was specified.

43.  Existing command summarize without the detail option now allows
iweights; see help summarize.

44.  Existing commands ci and summarize have new option separator(#)
that specifies how frequently separation lines should be inserted
into the output; see help ci and summarize.

45.  Existing command impute has three new options, regsample, all, and
copyrest that control the sample used for forming the imputation
and how out-of-sample values are treated; see help impute.

46.  Existing command collapse now takes time-series operators; see help
collapse.

What's new in data management

1.  New command odbc allows Stata for Windows to act as an ODBC client,
meaning you can fetch data directly from ODBC sources; see help
odbc.

2.  Existing command generate has new, more convenient syntax.  Now you
can type

. generate a = 2 + 3

or

. generate b = "this" + "that"

without specifying whether new variable b is numeric or string of a
particular length.  If you wish, you can also type

. generate str b = "this" + "that"

which asserts that b is a string but leaves it to generate to
determine the length of the string.  This is useful in programming
situations because it helps to prevent bugs.  Of course, you can
continue to type

. generate double a = _pi/2

and

. generate str8 b = "this" + "that"

See help generate.

3.  Existing command list has been completely redone.  Not only is
output far more readable -- and even pretty -- but programmers will
want to use list to format tables.  See help list.

4.  Existing command merge has been improved:

a.  New options unique, uniqmaster, and uniqusing ensure that the
merge goes as you intend.  These options amount to assertions
that, if false, cause merge to stop.  unique specifies that
there should not be repeated observations within match
variables, and that if you say "merge id using myfile", there
should be one observation per id value in the master data (the
data in memory) and one observation per id in the using data.
If observations are not unique, merge will complain.

Options uniqmaster and uniqusing make the same claim for one or
the other half of the merge; uniq is equivalent to specifying
uniqmaster and uniqusing.

b.  merge no longer has a limit on the number of match (key)
variables.

c.  merge has new option keep(varlist) that specifies the variables
to be kept from the using data.

See help merge.

5.  Existing command append has new option keep(varlist) that specifies
the variables to be kept from the using data; see help append.

6.  New command tsappend appends observations in a time-series context.
tsappend uses the information set by tsset, automatically fills in
the time variable, and fills in the panel variable if the panel
variable was set.  See help tsappend.

7.  Existing command describe using will now allow you to specify a
varlist, so you can check whether a variable exists in a dataset
before merging or appending.  Programmers will be interested in the
new varlist option, which will leave in r() the names of the
variables in the dataset.  See help describe.

8.  New command isid verifies that a variable or set of variables
uniquely identify the observations and so are suitable for use with
merge; see help isid.

9.  Existing command codebook has new option problems to report
potential problems in the data; see help codebook.

10.  New command labelbook is like codebook, but for value labels.  In
addition to providing documentation, the output includes a list of
potential problems.

New command numlabel prefixes numerical values onto value labels
and removes them.  For example, the mapping 2 --> "Catholic"
becomes "2. Catholic" and vice versa.

See help labelbook and numlabel.

11.  New command duplicates reports on, gives examples of, lists,
browses, tags, and/or drops duplicate observations; see help
duplicates.

12.  Existing command recode has three new features:

a.  recode now allows a varlist rather than a varname, so several
variables can be recoded at once.

b.  recode has new option generate() to specify that the
transformed variables be stored under different names than the
originals.

c.  recode has new option prefix(), an alternative to generate, to
specify that the transformed variables are to be given their
original names, but with a prefix.

See help recode.

13.  Existing command sort has new option stable that says, within equal
values of the sort keys, the observations are to appear in the same
order as they did originally.  See help sort.

14.  New command webuse loads the specified dataset, obtaining it over
the web.  By default, datasets are obtained from
http://www.stata-press.com/data/r8/, but you can reset that.  See
help webuse.

New command sysuse loads the specified dataset that was shipped
with Stata, plus any other datasets stored along the ado-path; see
help sysuse.

15.  Existing command insheet has a new delimiter(char) option that
allows you to specify an arbitrary character as the value
separator; see help insheet.

16.  Existing commands infile and infix no longer treat ^Z as the end of
a file; see help infile1, infile2 and infix.

17.  Existing command save has features:

a.  New option orphans specifies that all value labels, including
those not attached to any variables, are to be saved in the
file.

b.  New option emptyok specifies that the dataset is to be saved
even if it contains no variables and no observations.

c.  Existing option old is removed.  To save datasets in Stata 7
format, use the new saveold command; see help saveold.

See help save.  By the way, Stata 8 now has a single .dta dataset
format used by both Stata/SE and Intercooled Stata, meaning that
sharing data with colleagues is easy.

18.  Existing command outfile has new features:

a.  New options rjs and fjs specify how strings are to be aligned
in the output file.  The default is left alignment.  Option rjs
specifies right alignment.  Option fjs specifies alignment as
specified by the variables' formats.

b.  New option runtogether is for use by programmers; it specifies
that all string variables be run together without extra spaces
in between or quotes.

See help outfile.

19.  You may attach value labels to the new extended missing values (.a,
.b, ..., .z); see help label.

20.  As a consequence of the 26 new missing value codes, the maximum
value that can be stored in a byte, int, and long is reduced to
100, 32,740, and 2,147,483,620; see help datatypes.

21.  New command split splits the contents of a string variable into one
or more parts and is useful for separating words into multiple
variables; see help split.

22.  In the way of minor improvements are

a.  Existing command egen now allows longer numlists in the
values() option for the eqany() and neqany() functions; see
help egen.

b.  Existing command destring now allows an abbreviated newvarlist
in the generate() option; see help destring.

c.  Existing commands icd9 and icd9p have been updated to use the
V18 and V19 codes; V16, V18, and V19 codes have been merged so
that icd9 and icd9p work equally well with old and new
datasets; see help icd9.

d.  Existing command egen mtr() has been updated to include the
marginal tax rates for the years 2000 and 2001; see help egen.

e.  Existing command mvdecode's mv() option now allows a numlist;
see help mvencode.

f.  Existing command mvencode has a new, more versatile syntax to
accommodate extended missing values; see help mvencode.

g.  Existing command xpose has three new options: format,
format(%fmt), and promote.  The format option finds the largest
numeric display format in the pretransposed data and applies it
to the transposed data.  The format(%fmt) option sets the
transposed data to the specified format.  The promote option
causes the transposed data to have the most compact numeric
data type that preserves the original data accuracy.  See help
xpose.

h.  Existing command notes now allows the individual notes to
include SMCL directives; see help notes.

i.  Existing command mkmat has new nomissing option that causes
observations with missing values to be excluded (because
matrices can now contain missing values).  mkmat has also been
made faster.  See help mkmat.

j.  Existing command ds has three new options: alpha, varwidth(#),
and skip(#).  alpha sorts the variables in alphabetic order.
varwidth(#) specifies the display width of the variable names.
skip(#) specifies the number of spaces between variables.  See
help describe.

k. Existing commands label dir now returns the names of the defined
value labels in r(names) and label list now returns the minimum
and maximum of the mapped values in r(min) and r(max); see help
label.

What's new in expressions and functions

1.  First, a warning:  Do not type

. generate newvar = ... if oldvar != .

. replace oldvar = ... if oldvar != .

. list ... if var != .

Type

. generate newvar = ... if oldvar < .

. replace oldvar = ... if oldvar < .

. list ... if var < .

or type

. generate newvar = ... if !mi(oldvar)

. replace oldvar = ... if !mi(oldvar)

. list ... if !mi(var)

Stata has new missing values and the ordering is all numbers < . <
.a < .b < ... < .z.  If you do not use the new missing values, then
your old habits will work, but better to be safe.

It is a hot topic of debate at StataCorp whether varname<. or
!mi(varname) is the preferred way of excluding missing values, and
therefore both constructs are deemed to be equally stylish; use
whichever appeals to you.

New function mi() is a synonym for existing function missing(); it
returns 1 (true) if missing and false otherwise.  See help progfun.

2.  By the same token, do not type

. list ... if var == .

To list observations with missing values of {\it var}, type

. list ... if var >= .

or type

. list ... if mi(var)

3.  Matrices can now contain missing values, both the standard one (.)
and the extended ones (.a, .b, ..., .z).

4.  The following new density functions are provided:

a.  tden(n,t), the density of Student's t distribution.

b.  Fden(n_1,n_2,F), the density of the F distribution.

c.  nFden(n_1,n_2,lambda,F), the noncentral F density.

d.  betaden(a,b,x), the 2-parameter Beta density.

e.  nbetaden(a,b,g,x), the noncentral Beta density.

f.  gammaden(a,b,g,x), the 3-parameter Gamma density.

See help probfun.

5.  The following new cumulative density functions are provided:

a.  nFtail(n_1,n_2,lambda,f), the upper-tail of the noncentral F.

b.  nibeta(a,b,lambda,x), the cumulative noncentral ibeta
probability.

See help probfun.

6.  The following new inverse cumulative density functions are
provided:

a.  invnFtail(n_1,n_2,lambda,p), the noncentral F corresponding to
upper-tail p.

b.  invibeta(a,b,p), the incomplete beta value corresponding to p.

c.  invnibeta(a,b,lambda,p), the noncentral beta value
corresponding to p.

In addition, existing function invbinomial(n,k,p) has improved
accuracy.  See help probfun.

7.  A suite of new functions provides partial derivatives of the
cumulative gamma distribution.  The following new functions are
provided:

a.  dgammapda(a,x), partial derivative of gammap(a,x) with respect
to a.

b.  dgammapdx(a,x), partial derivative of gammap(a,x) with respect
to x.

c.  dgammapdada(a,x), 2nd partial derivative of gammap(a,x) with
respect to a.

d.  dgammapdxdx(a,x), 2nd partial derivative of gammap(a,x) with
respect to x.

e.  dgammapdadx(a,x), 2nd partial derivative of gammap(a,x) with
respect to a and x.

See help probfun.

8.  All density and distribution functions have been extended to return
nonmissing values over the entire real line; see help probfun.

9.  The following new string functions are provided:

a.  word(s,n) returns the nth word in s.

b.  wordcount(s) returns the number of words in s.

c.  char(n) returns the character corresponding to ASCII code n.

d.  plural(n,s_1) returns the plural of s_1 if n does not equal 1
or -1, and otherwise returns s_1.

e.  plural(n,s_1,s_2) returns the plural of s_1 if n does not equal
1 or -1, forming the plural by adding or removing suffix s_2.

f.  proper(s) capitalizes the first letter of a string and any
other letters immediately following characters that are not
letters; remaining letters are converted to lowercase.

See help strfun.

10.  The following new mathematical functions are provided:

a.  logit(x), the log of the odds ratio.

b.  invlogit(x), the inverse logit.

c.  cloglog(x), the complementary log-log.

d.  invcloglog(x), the inverse of the complementary log-log.

e.  tanh(x), the hyperbolic tangent.

f.  atanh(x), the inverse-hyperbolic tangent of x.

g.  floor(x), the integer n such that n <= x < n+1.

h.  ceil(x), the integer n such that n < x <= n+1.

In addition, the following existing mathematical functions have
been modified:

i.  round(x,y) now allows the second argument be optional and
defaults it to 1, so round(x) returns x rounded to the closest
integer.

j.  lngamma(x) and gammap(a,x) now have improved accuracy.

See help mathfun.

11.  Existing function uniform() will now allow you to capture and reset
its seed.  The seed value, in encrypted form, is now shown by
query.  You can store its value by typing

local seed = c(seed)

Later, you can reset it by typing

. set seed seed'

See help seed and help random.

12.  The following new matrix functions are provided:

a.  issym(M) returns 1 if matrix M is symmetric and returns 0
otherwise; issym() may be used in any context.

b.  matmissing(M) returns 1 if any elements of M are missing and
returns 0 otherwise; matmissing() may be used in any context.

c.  vec(M) returns the column vector formed by listing the elements
of M, starting with the first column and proceeding column by
column.

d.  hadamard(M,N) returns a matrix whose i, j element is M[i,j] *
N[i,j].

e.  matuniform(r,c) returns the r by c matrix containing uniformly
distributed pseudo-random numbers on the interval [0,1).

See help matfcns.

In addition, the new command matrix eigenvalues returns the complex
eigenvalues of an n by n nonsymmetric matrix; see help mateig.

13.  The following new programming functions have been added:

a.  clip(x,a,b) returns x if a <= x <= b, a if x <= a, b if x >= b,
and missing if $x$ is missing.

b.  chop(x,epsilon) returns round(x) if |x - round(x)| < epsilon,
otherwise returns x.

c.  irecode(z,x_1,x_2, ... ,x_n) returns the index of the range in
which z falls.

d.  maxbyte(), maxint(), maxlong(), maxfloat(), and maxdouble()
return the maximum value allowed by the storage type.

e.  minbyte(), minint(), minlong(), minfloat(), and mindouble()
return the minimum value allowed by the storage type.

f.  epsfloat() and epsdouble() return the precision associated with
the storage type.

g.  byteorder() returns 1 if the computer stores numbers in
most-significant-byte-first format and 0 if in
least-significant-byte-first format.

The following programming functions have been modified or extended:

h.  missing(x) now optionally allows multiple arguments so that it
becomes missing(x_1,x_2, ... ,x_n).  The extended function
returns 1 (true) if any of the x_i are missing and returns 0
(false) otherwise.

i.  cond(x,a,b) now optionally allows a fourth argument so that it
becomes cond(x,a,b,c).  c is returned if x evaluates to
missing.

See help progfun.

What's new in display formats

1.  The %g format has been modified:  %#.0g still means the same as
previously, but %#.#g has a new meaning.  For instance, %9.5g means
to show approximately 5 significant digits.  We say approximately
because, given the number 123,456, %9.5g will show 123456 rather
than 1.2346e+05, as would strictly be required if only five digits
are to be shown.  Other than that, it does what you would expect,
and we think, in all cases, does what you want.

2.  %[-]0#.#f formats, note the leading 0, now specify that leading
zeros are to be included in the result.  1.2 in %09.2f format is
000001.20.

3.  Stata has a new %21x hexadecimal format that will mainly be of
interest to numerical analysts.  In %21x, 123,456 looks like
+1.e240000000000X+010, which you read as the hexadecimal number
1.e24 multiplied by 2^10.  The period in 1.e24 is the base-16
point.  The beauty of this format is that it reveals numbers
exactly as the binary computer thinks of it.  For instance, the new
format shows how difficult numbers like 0.1 are for binary
computers:  +1.999999999999aX-004.

You can use this hexadecimal way of writing numbers in expressions;
Stata will understand, for instance,

. generate xover4 = x / 1.0x+2

but it is unlikely you would want to do that.  The notation will
even by understood by input, infix, and infile.  There is no %21x
input format, but wherever a number appears, Stata will understand
#.##...#x[+|-]###.

See help format.

What's new in programming

Lots of programming improvements have been made; see What's new in [P]
intro.  Here we will just touch on a few highlights.

1.  The two big features are the ability to program dialog boxes and
the addition of class programming; see help dialogs and class.
Stata's new GUI and new graphics have been programmed using these
new features.

2.  The new c-class collects where settings are found.  Type creturn
list and all will become clear.  Recorded in c(settingname) are all
the system settings, so no longer do you have to wonder whether the
setting is in \$S_something, obtained as a result of an extended
macro function, or found somewhere else.  See help creturn.

3.  Program debugging is now easier thanks to the new trace facilities.

a.  Trace output now shows the line with macros expanded as well as
unexpanded.  This makes spotting errors easier.

b.  Separators are drawn and output indented when one program calls
another, making it easier to see where you are.

c.  set trace is now pushed-and-popped, so the original value will
be restored when a program ends.

d.  The new command set tracedepth allows you to specify how deeply
calls to subroutines should be traced, so you can eliminate
unwanted output.

See help trace.

4.  One change will bite you:  With if exp, while exp, forvalues, and
all the other commands that take a brace, no longer can the open
brace and close brace be on the same line as the command.  You may
not code

if (exp) { ... }

You must instead code

if (exp) {
...
}

In the case of if, you may omit the braces altogether:

if (exp) ...

Under version control, Stata continues to tolerate the old, all on
one line syntax, but the new syntax makes Stata considerably
faster.  See help ifcmd.

5.  Existing commands postfile, post, and postclose will now save
string variables; see help postfile.

6.  Do-files and ado-files now allow // comments and /// continuation
lines.  // on a line says that from here to the end of the line is
a comment.  /// does the same, but also says that the next line is
to be joined with the current line (and not treated as a comment).

7.  Existing command which will now not only locate .ado files, but
other system files as well.  You can type, for instance, which
anova.hlp to discover the location of the help file for anova.  See
help which.

New command findfile will look for any file along the adopath; see
help findfile.

8.  The sysdir directory STBPLUS is now called PLUS; see help sysdir.

9.  net .pkg files have new features:

a.  F filename is a variation on f filename that specifies the file
is to be installed into the system directories, even if it
ordinarily would not.  This is useful for installing .dta
datasets that accompany ado-files.

b.  g platformname filename is another variation on f filename.  It
specifies that the file is to be installed only if the user's
computer is of type platformname.

c.  G platformname filename is variation on F filename.  The file
is installed only if the user's computer is of type
platformname, and, if it is installed, it is installed in the
system directories.

d.  h filename asserts that filename must be loaded or else this
package cannot be installed.

e.  The maximum number of description lines in a .pkg file has been
increased from 20 to 100.

See help net and usersite.

There are lots of new programming features, and the ones we have chosen
to mention may not be of the most interest to you.  Do see What's new in
[P] intro.

What's new in the user interface

1.  The GUI, of course, but we have already mentioned that; see Stata's
interface in Chapter 3 of the Getting Started with Stata manual.

2.  Stata now has tab-name completion.  When typing a command, type the
first few letters of a variable name and press tab.

3.  Existing commands set and query have been redone.  set now has a
permanently option that makes the setting permanent across
sessions, alleviating the need for creating profile.do files.
query has a new output format.  See help set and query.

4.  There are lots of new set parameters.  Do not even try to dig them
out of the manual.  Instead, type query.  The new query output
shows you where you can find out about each and what values you can
set.

5.  Almost all windows now have contextual menus; right-click when you
are in the window to try them.

6.  Under Windows and Mac, the following improvements have been made:

a.  If an http proxy is needed, Stata will attempt to get the
proper settings from the operating system; see help netio.

b.  You are no longer limited to a maximum of 10 nested do-files.
The limit is now 64, the same as Stata for Unix.

7.  Under Windows, the following improvements have been made:

a.  Shortcuts for .smcl files have been added.  By default,
double-clicking on the shortcut will open the file in the
Viewer, and right-clicking on the shortcut and choosing Edit
will open the file in the Do-file Editor.

b.  Multiple instances of Stata for Windows running at the same
time are now clearly marked in their title bar with an instance
number.

c.  You can now set the maximum number of lines recorded in the
Review window using set reventries; see help reventries.

8.  Under Mac, the following improvements have been made:

a.  Stata is now a native Mach-O application.  It may be launched
from a terminal with command line options in addition to the
usual double-clicking on Stata from the Finder.

b.  Stata can now change the amount of memory allocated on the fly
just as Stata can on other operating systems; see help memory.

c.  Stata can now pass commands to the operating system for
execution; see help shell.

d.  The filename separator is now forward slash (/) rather than
colon (:) in keeping with changes made by Apple.  For backward
compatibility, Stata still recognizes a colon (:) as a filename
separator.

e.  You can now open more than one file simultaneously in the
Do-file Editor.

f.  Stata honors and sets file permissions when creating files.

g.  Stata now uses /tmp for its temporary files.

h.  You can now select all the contents of the Results or Viewer
windows by selecting Select All from the Edit menu.

i.  There is a new menu item, Bring All to Front, in the Window
menu that brings all Stata windows to the front.

9.  Stata for Unix now looks for the environment variable STATATMP in
addition to the environment variable TMPDIR for the location of the
directory where temporary files are stored.  STATATMP takes
precedence over TMPDIR.

What's more

We have not listed all the changes, but we have listed the important
ones.  The remaining changes -- a list of about equal length as the one
above -- are all implications of what has been listed.

What is important to know is that Stata is continually being updated and
those updates are available for free over the Internet.  All you have to
do is type

. update query

and follow the instructions.  (Or just click here to update).

We hope you enjoy Stata 8.

--- previous updates ----------------------------------------------------------

See whatsnew7.

-------------------------------------------------------------------------------

`