__What's new in release 14.0 (compared with release 13)__

This file lists the changes corresponding to the creation of Stata
release 14.0:

+---------------------------------------------------------------+
| help file contents years |
|---------------------------------------------------------------|
| whatsnew Stata 15.0 and 15.1 2017 to present |
| whatsnew14to15 Stata 15.0 new release 2017 |
| whatsnew14 Stata 14.0, 14.1, and 14.2 2015 to 2017 |
| **this file** Stata 14.0 new release 2015 |
| whatsnew13 Stata 13.0 and 13.1 2013 to 2015 |
| whatsnew12to13 Stata 13.0 new release 2013 |
| whatsnew12 Stata 12.0 and 12.1 2011 to 2013 |
| whatsnew11to12 Stata 12.0 new release 2011 |
| whatsnew11 Stata 11.0, 11.1, and 11.2 2009 to 2011 |
| whatsnew10to11 Stata 11.0 new release 2009 |
| whatsnew10 Stata 10.0 and 10.1 2007 to 2009 |
| whatsnew9to10 Stata 10.0 new release 2007 |
| whatsnew9 Stata 9.0, 9.1, and 9.2 2005 to 2007 |
| whatsnew8to9 Stata 9.0 new release 2005 |
| whatsnew8 Stata 8.0, 8.1, and 8.2 2003 to 2005 |
| whatsnew7to8 Stata 8.0 new release 2003 |
| whatsnew7 Stata 7.0 2001 to 2002 |
| whatsnew6to7 Stata 7.0 new release 2000 |
| whatsnew6 Stata 6.0 1999 to 2000 |
+---------------------------------------------------------------+

Most recent changes are listed first.

--- **more recent updates** -------------------------------------------------------

See whatsnew14.

--- **Stata 14.0 release 02apr2015** ----------------------------------------------

**Contents**
1.3 What's new
1.3.1 Highlights
1.3.2 What's new in statistics (general)
1.3.3 What's new in statistics (SEM)
1.3.4 What's new in statistics (multilevel modeling)
1.3.5 What's new in statistics (treatment effects)
1.3.6 What's new in statistics (longitudinal/panel data)
1.3.7 What's new in statistics (time series)
1.3.8 What's new in statistics (survival analysis)
1.3.9 What's new in statistics (survey data)
1.3.10 What's new in statistics (power and sample size)
1.3.11 What's new in statistics (multiple imputation)
1.3.12 What's new in statistics (multivariate)
1.3.13 What's new in data management
1.3.14 What's new in functions
1.3.15 What's new in graphics
1.3.16 What's new in Mata
1.3.17 What's new in programming
1.3.18 What's new in the Stata interface
1.3.19 What's more

This section is intended for users of the previous version of Stata. If
you are new to Stata, you may as well skip to *What's more*, below.

As always, we remind programmers it is important that you put **version 14**,
**version** **13.1**, or **version 12**, etc., at the top of your old do- and
ado-files so that they continue to work as you expect. You were supposed
to do that when you wrote them, but if you did not, go back and do it
now.

We will list all the changes, item by item, but first, here are the
highlights.

__1.3.1 Highlights__

1. **Unicode support**

Здравствуйте. こんにちは. Hello.

Stata 14 supports Unicode (UTF-8). All of Stata is Unicode aware.
You may use Unicode for variable names, labels, data, and whatever
else you wish. Not only do you have more characters from which to
choose, but when you share data, others will see what you see.

**Warning ... Files may need translating**

If you previously used Extended ASCII to overcome the
limitations of plain ASCII in your .dta files, do-files, and
ado-files, they need to be translated from Extended ASCII to
Unicode. We have made that easy: see **[D] unicode translate**.
If you do not translate your files, they will not display
properly. If you used Extended ASCII for variable names, you
may not even be able to type the mangled names!

**New string functions**

Stata 14 has three times the number of string functions as
Stata 13. To understand why, let's start with what you may
find a surprising side effect of using Unicode. Say you had
an **str3** variable and you replaced a value in one observation
with "für". The variable would be **str4** after the change! It
would be **str4** because it takes four memory positions (bytes)
to store "für": one each for "f" and "r" and two for "ü".

The standard ASCII characters consume one memory position just
as they did previously, but the other Unicode characters need
two, three, or even four memory positions. **strlen()** works in
terms of memory positions, so **strlen("für")** reports 4
positions, not 3 characters. New function **ustrlen("für")**
works in terms of character positions, so it reports 3
characters. **strlen("こんにちは")** is 15 if you can believe it,
but **ustrlen("こんにちは")** is a reassuring 5. By the way,
こんにちは is pronounced "Kon'nichiwa" and means "hello".

Anyway, for each string function, *fcn***()**, that needs it, there
is a new, corresponding function, **u***fcn***()**. **u***fcn***()** uses the
character-position metric instead of the memory-position
metric. As another example, **usubstr(***s***,** **2**, **3)** returns up to
three characters starting at the second. **substr(***s***,** **2**, **3)**
returns up to three memory positions (bytes) starting at the
second. Memory positions are the same as characters only if *s*
is ASCII.

If you are writing for an international audience, you need to
distinguish between the two flavors of each string function.

The third new group of string functions are the **ud***fcn***()**s.
They work in the display-position metric. **udstrlen("für")** is
3, meaning it takes 3 columns to display "für".
**udstrlen("こんにちは")** is 10, meaning it takes 10 columns to
display "こんにちは". You use the new **ud***fcn***()**s when you are
aligning output in a table. **udsubstr(***s***,** **2**, **3)** returns however
many characters it takes to fill up to three columns, starting
after the second character.

**Graphs, SEM Builder, Unicode, and Extended ASCII**

Export graphs or output containing Unicode using PDF instead
of PostScript (PS) or Encapsulated PostScript (EPS). PS and
EPS do not support Unicode. In some cases, you can use PS and
EPS because Stata converts accented Latin characters to the
Extended ASCII characters that PS and EPS expect.

If you have Stata 13 or earlier .gph graph files or .stsem SEM
Builder files, and those files contain Extended ASCII, Stata
14 will not display the Extended ASCII characters correctly,
and they cannot be translated. You can edit them.

See **[U] 12.4.2 Handling Unicode strings** and see **[D] unicode** for more
information on Stata 14's new Unicode capabilities.

2. **More than 2 billion observations now allowed**

Stata/MP, the multiprocessor version of Stata 14, now allows more
than 2 billion observations, or more correctly, more than
2,147,483,620 observations. The maximum now depends solely on the
amount of memory on your computer. Stata will not limit you; it can
now count up to 281 trillion observations.

How many observations you can process depends on the size of your
computer and the width of your data. Here are some sample
calculations and a formula:

Billions of observations
Computer's Memory scenario
memory used (1) (2) (3)
-------------------------------------------
128GB 112GB 1.8 1.4 1.0
256GB 240GB 3.8 2.9 2.1
512GB 496GB 7.9 6.1 4.4
1024GB 1008GB 16.2 12.3 9.8
1536GB 1520GB 24.4 18.5 13.6
-------------------------------------------

Notes:
*Memory used* is the total used for storing data. We
left 16GB free for Stata and other processes, meaning
that we assumed that Stata consumes nearly all the
computer's resources (single user).

*Observations* leaves extra room for adding three doubles, because
Stata commands often add working variables. The width used by
the three scenarios is for your data exclusive of working
variables.

Scenario 1: width = 43 bytes (same as **auto.dta**)
Scenario 2: width = 64 bytes
Scenario 3: width = 96 bytes

Calculation:

*memory_used* 1024³
*obs* = ------------ × ------
*width* + 24 1000³

where *memory_used* is in gigabytes and *obs* is in billions.

There is nothing more to know except that we have advice on how to
improve Stata's performance when processing datasets with more than
2 billion observations; see help obs advice.

3. **Bayesian statistical analysis**

Stata 14 provides Bayesian statistical analyses with the new **bayesmh**
command and corresponding suite of features. The *mh* on the end of
**bayesmh** stands for Metropolis-Hastings. You can fit models by using
an adaptive Metropolis-Hastings algorithm, or a full Gibbs sampling
for some models, or a combination of the two algorithms. After
estimation, you can diagnose convergence and analyze results.

Fitting a model can be as easy as typing

. **bayesmh y x, likelihood(logit) prior({y:}, normal(0,100))**

You can use our suite of preprogrammed likelihood models, or you can
write your own. Postestimation features are the same either way.
And even if you write your own models, you can still use the
built-in priors.

We provide 12 built-in likelihood models and 22 built-in prior
distributions. Built in are continuous, binary, ordinal, and count
likelihood models. Built in are continuous univariate, continuous
multivariate, discrete, and more prior distributions. Supported are
univariate, multivariate, and multiple-equation models, including
linear and nonlinear models and including generalized nonlinear
models.

Results are reported with credible intervals (CrIs). You can check
convergence visually by typing **bayesgraph** **diagnostics** **_all**. You can
check Markov chain Monte Carlo (MCMC) efficiency by using the new
**bayesstats** **ess** command.

You can obtain estimates of the posterior means and their MCMC
standard errors not only for model parameters but also for functions
of model parameters.

You can compare models using Bayesian information criteria such as
the deviance information criterion or Bayes factors.

You can perform interval hypothesis testing by computing
probabilities that a parameter or set of parameters or even
functions of parameters belong to a specified range.

You can perform model hypothesis testing by computing probabilities
of models given the observed data, which is to say, using model
posterior probabilities.

And you can store your MCMC and estimation results for later
analysis.

We provide an entire, new manual on all of this; see the *Stata*
*Bayesian Analysis Reference Manual*.

4. **IRT models**

IRT stands for item response theory. IRT models explore the
relationship between a latent (unobserved) trait and items that
measure aspects of the trait. This often arises in standardized
testing. A set of items (questions) is designed, the responses to
which measure, say, the unobservable trait mathematical ability. Or
questions are designed to measure unobservable cognitive abilities,
personality traits, attitudes, health outcomes, quality of life,
morale, and so on. The observable items do not have to be responses
to questions, but they usually are. The items can be any observable
variables that we believe measure the trait.

Stata can fit models for binary items, ordinal items, or categorical
items. These include, for binary items, one-parameter logistic
(1PL), two-parameter logistic (2PL), and three-parameter logistic
(3PL); for ordinal items, graded response models, rating scale
models, and partial credit models; and for categorical items,
nominal response models. Stata can also fit hybrid models where
different items use different models.

Once a model is fit, Stata can graph item characteristic curves
(ICCs), test characteristic curves (TCCs), item information
functions (IIFs), and test information functions (TIFs).

Stata includes a control panel to guide you through the fitting and
analysis of models.

There is a lot more to say; see the all-new *Stata Item Response*
*Theory Reference Manual*.

5. **Panel-data survival models**

New estimation command **xtstreg** fits parametric panel-data survival
models with random effects. Five distributions are provided:
exponential, loglogistic, Weibull, lognormal, and gamma. Estimation
is in the accelerated failure-time metric, but exponential and
Weibull also allow the proportional-hazards metric.

**xtstreg** is both an xt and an st command; you **xtset** the panel
characteristics of the data and **stset** the survival characteristics.
Hence, single- and multiple-record st data as well as all the other
survival-data features are supported, and survivor, hazard, and
cumulative hazard functions can be graphed using **stcurve**.

Frequency, importance, and probability sampling weights are allowed.

See **[XT] xtstreg**.

New estimation command **mestreg** fits panel-data models with random
coefficients and random intercepts. See **[ME] mestreg**.

6. **New in treatment effects**

Stata 14 provides many new features for fitting and evaluating
treatment effects. Treatment effects seek to extract
experimental-style causal effects from observational data.

**Treatment effects for survival models**

New estimator **stteffects ra** estimates average treatment effects
(ATEs), average treatment effects among the treated (ATETs), and
potential-outcome means (POMs) via regression adjustment. See
**[TE] stteffects ra**.

New estimator **stteffects ipw** estimates ATEs, ATETs, and POMs via
inverse-probability weighting. See **[TE] stteffects ipw**.

New estimator **stteffects ipwra** estimates ATEs, ATETs, and POMs
via inverse-probability-weighted regression adjustment. See **[TE]**
**stteffects ipwra**.

New estimator **stteffects wra** estimates ATEs, ATETs, and POMs via
weighted regression adjustment. See **[TE] stteffects wra**.

All the new estimators allow probability sampling weights.

**Endogenous treatments**

Stata 14 has a new estimator for endogenous treatments.
Endogenous treatments arise when both the treatment model and the
outcome model share unobserved covariates.

**etteffects** estimates ATEs, ATETs, and POMs for continuous,
binary, count, fractional, and nonnegative outcomes when
treatment assignment is correlated with outcome. See **[TE]**
**eteffects**.

**Probability weights**

All the new estimators listed above support probability sampling
weights. Support is also provided for **[TE] teffects ipwra**, **[TE]**
**teffects ipw**, and **[TE] teffects ra**.

**Balance analysis**

Stata 14 performs balance analysis for treatment effects. A key
requirement is that our treatment-effects model explicitly or
implicitly reweights the data such that the treated and the
untreated groups have comparable covariate values. Four new
commands are provided to assess and to test balance.

**tebalance summarize** reports model-adjusted means and variances of
covariates for the treated and untreated. See **[TE] tebalance**
**summarize**.

**tebalance density** graphs kernel density plots for the
model-adjusted data for the treated and untreated. See **[TE]**
**tebalance density**.

**tebalance box** graphs box plots for the model-adjusted data for
the treated and untreated. See **[TE] tebalance box**.

**tebalance overid** tests for covariate balance. See **[TE] tebalance**
**overid**.

7. **Multilevel mixed-effects parametric survival models**

New command **mestreg** fits multilevel mixed-effects parametric
survival models.

Five distributions are supported: exponential, loglogistic,
Weibull, lognormal, and gamma.

Both proportional-hazards and accelerated failure-time
parameterizations are supported.

Random effects, including random intercepts and random
coefficients, at different levels of hierarchy are supported.

Single- or multiple-record st data as well as other survival-data
features are supported via the **stset** command. See **[ST] stset**.

Survey data are supported via the **svy** prefix. See **[SVY] svy**.

Relationships between multiple random effects can be independent
or freely correlated, or you can specify the covariance
structure.

Postestimation statistics include predictions of mean and median
survival times, and hazard and survivor functions. Predictions
can be obtained conditionally or unconditionally on random
effects.

Survivor, hazard, and cumulative hazard functions can be graphed
using the **stcurve** command. See **[ST] stcurve**.

See **[ME] mestreg**.

8. **Small-sample inference for fixed effects in linear multilevel mixed**
**models**

Existing command **mixed** fits linear multilevel mixed models. **mixed**
reports asymptotic test statistics for fixed effects by default.
Those statistics have large-sample normal and chi-squared
distributions. See **[ME] mixed**.

When groups are balanced and sample size is small, these test
statistics have exact t and F distributions for certain classes of
models. In other situations, such as when groups are unbalanced,
the sampling distributions of the statistics may be approximated by
t and F distributions. Approximations differ in how (denominator)
degrees of freedom are computed. The small-sample tests may yield
better coverages.

New option **dfmethod(***method***)** provides various degree-of-freedom
adjustments. Five methods are provided, including Kenward-Roger and
Satterthwaite methods.

New postestimation command **estat df** reports the degrees of freedom
for each coefficient.

**test**, **testparm**, and **lincom** have new option **small** to perform
small-sample inference for fixed effects.

See *Small-sample inference for fixed effects* in **[ME] mixed**.

9. **New SEM (structural equation modeling) features**

Existing commands **sem** and **gsem** provide the following new features:

**Survival models**

**gsem** now fits parametric survival models. With the new
multilevel survival models previously mentioned, you might wonder
why you would care. You care because SEM can fit multivariate
models including survival models with unobserved components
(latent variables), and combine survival models with other models
involving continuous, binary, count, and other kinds of outcomes.

Five families are added to **gsem**: exponential, loglogistic,
lognormal, Weibull, and gamma. Options are added for specifying
right-censoring and left-truncation, as is common (and necessary)
for analyzing survival times. See **[SEM] sem option method()**.

**predict** after **gsem** has new option **survival** for computing
survival-time predictions. The predicted survival function is
computed using the current outcome values and the estimated
parameters.

All the new families support the accelerated failure-time metric.
Exponential and Weibull also support the proportional-hazards
metric.

If you use the SEM Builder, simply select one of the survival
families from the contextual toolbar.

**Satorra-Bentler scaled chi-squared test**

**sem** now provides the Satorra-Bentler scaled chi-squared
model-versus-saturated test; specify new option **vce(sbentler)**.
In addition, corresponding robust standard errors (SEs) are
produced and reported. This test and the SEs are robust to
nonnormal distributions. These are an alternative to the
previously provided robust SEs. Goodness-of-fit statistics based
on the model chi-squared are also adjusted.

**Support for survey data**

**gsem** now supports the **svy** prefix for analyzing survey data, which
includes multilevel weights. See **[SVY] svy**.

All the postestimation features available after survey estimation
are also available. See **[SVY] svy postestimation** and **[SVY]**
**estat**.

**Support for observational and multilevel weights**

**gsem** now supports observation-level weights and multilevel
weights, even outside of a survey-data context.

**Beta distribution**

**gsem** adds the beta distribution to the choice of families and may
be used with logit, probit, and cloglog links. The beta
distribution is particularly appropriate for fractional or
proportion data.

See **[SEM] sem and gsem**.

**gsem** provides the following new postestimation features:

**predict** has new options **density** and **distribution** for computing
the density and distribution function for each outcome using
their current values and the estimated parameters.

**predict** has new option **marginal** for computing observed endogenous
predictions that are marginal with respect to the latent
variables, meaning that prediction is produced by integrating
over the distribution of the latent variable(s).

**predict** has new option **expression()** that calculates linear and
nonlinear functions of the mean and linear predictions.

See **[SEM] predict after gsem**.

10. **Power analysis for survival and epidemiological methods**

Existing command **power** now provides all-new power analysis for
epidemiological methods.

In addition, existing command **stpower** for performing power analysis
on survival models is now undocumented, and its capabilities are now
folded into existing command **power**. The advantage is that extensive
graphing and tabulation of results are now available.

Here are the details. We will start with the survival models:

**New methods for analysis of survival models**

**power cox** estimates required sample size, power, and effect size
using Cox proportional hazards models allowing for multiple
covariates. It allows for correlation between the covariate of
interest and other covariates, and it allows for withdrawal of
subjects from the study. See **[PSS] power cox**.

**power exponential** estimates required sample size and power
comparing two exponential survivor functions. It accommodates
unequal allocation between the two groups, flexible accrual of
subjects into the study (uniform and truncated exponential), and
group-specific losses to follow-up. See **[PSS] power exponential**.

**power logrank** estimates required sample size, power, and effect
size for comparing survivor functions in two groups using the
log-rank test. It provides options to account for unequal
allocation of subjects between the groups, possible withdrawal of
subjects from the study (loss to follow-up), and uniform accrual
of subjects into the study. See **[PSS] power logrank**.

As with all other **power** methods, **cox**, **exponential**, and **logrank**
allow you to specify multiple values of parameters and
automatically produce tabular and graphical results.

**New methods for analysis of contingency tables**

This will be of special interest to epidemiological researchers.

**power cmh** performs power and sample-size analysis for a
Cochran-Mantel-Haenszel test of association in stratified 2 x 2
tables. It computes sample size, power, or effect size (common
odds ratio) given other study parameters. It provides
computations for designs with unbalanced stratum sizes as well as
unbalanced group sizes within each stratum. See **[PSS] power cmh**.

**power mcc** performs power and sample-size analysis for a test of
association between a risk factor and a disease in 1:M matched
case-control studies. It computes sample size, power, or effect
size (odds ratio) given other study parameters. See **[PSS] power**
**cmh**.

**power trend** performs power and sample-size analysis for a
Cochran-Armitage test of a linear trend in a probability of
response in J x 2 tables. The rows of the table correspond to
ordinal exposure levels. The command computes sample size or
power given other study parameters. It provides computations for
unbalanced designs and for unequally spaced exposure levels
(doses). With equally spaced exposure levels, a continuity
correction is available. See **[PSS] power trend**.

As with all other **power** methods, **cmh**, **mcc**, and **trend** allow you to
specify multiple values of parameters and automatically produce
tabular and graphical results.

See **[PSS] power**.

11. **Markov-switching regression models**

New estimation command **mswitch** fits Markov-switching models. These
are times-series models in which some of or all the parameters of a
regression probabilistically transition among a finite set of
unobserved states with unknown transition points.

Let's consider some examples. In economics, switching regression
has been used to model growth rate of GDP to model asymmetric
behavior observed over expansions and recessions.

In finance, switching has been used to model monthly stock returns.

In political science, switching has been used to model transitions
between Democratic and Republican partisanship.

In psychology, switching has been used to model transitions between
manic and depressive states.

In epidemiology, switching has been used to model the incidence rate
of infectious disease in epidemic and nonepidemic states.

**mswitch** provides two ways of modeling the switching process:
autoregressive (AR) and dynamic regression (DR). AR is typically
used for processes that change slowly, and DR is typically used for
processes that transition rapidly. See **[TS] mswitch**.

New postestimation commands **estat transition** and **estat duration**
report the transition probabilities and expected state durations.
See **[TS] mswitch postestimation**.

After estimation, you can **predict** the dependent variable, and you
can also predict the probability of being in each of the unobserved
states. These predictions are available as one-step ahead (static)
or multi-step ahead (dynamic).

12. **Tests for structural breaks in time-series data**

Two new postestimation commands test for structural breaks after
estimation by **regress** or **ivregress**.

**estat sbknown** tests for structural breaks at known dates.

**estat sbsingle** tests for a structural break at an unknown date.

See **[TS] estat sbknown** and **[TS] estat sbsingle**.

13. **Regression models for fractional data**

Two new estimation commands are provided for fitting models when the
dependent variable is a fraction, a proportion, or a rate.

New estimation command **fracreg** allows the dependent variable to be
in the range 0 to 1 inclusive. It fits probit, logit, and
heteroskedastic probit models. See **[R] fracreg**.

New estimation command **betareg** requires the dependent variable to be
in the range 0 to 1 exclusive. It fits beta regression. See **[R]**
**betareg**.

Both new estimators support the **svy** prefix for analyzing survey
data.

14. **Survey support and multilevel weights for multilevel models**

We use the term "survey support" to include both Stata's **svy** prefix
and multilevel probability weights outside of the survey context.

The following estimation commands now support the **svy** prefix using
the linearized estimate of the variance-covariance estimate:
**mecloglog**, **meglm**, **melogit**, **menbreg**, **meologit**, **meoprobit**, **mepoisson**,
and **meprobit**.

The same estimation commands now support multilevel sampling and
frequency weights, too.

15. **New random-number generators (RNGs)**

Existing function **runiform()** now uses the 64-bit Mersenne Twister.
**runiform()** produces uniformly distributed random numbers, and the
functions providing random numbers for other distributions use
**runiform()** in producing their results. Thus, all of Stata's RNGs
are now based on the Mersenne Twister, too. Stata previously used
KISS32 and still does under version control.

KISS32 is an excellent RNG, but the Mersenne Twister has better
properties and a longer period, namely 2^19937-1. The Mersenne
Twister is 623-dimensionally equidistributed and has 53-bit
resolution.

**Uniformly distributed RNGs in specified intervals**

Existing function **runiform()** now allows you to specify the range
over which random variates will be supplied. **runiform(***a***,** *b***)**
returns values in the open interval (*a*, *b*).

New function **runiformint(***a***,** *b***)** returns integer values in the
closed interval [*a*, *b*].

It is a minor technical detail, but existing function **runiform()**
without arguments now produces random numbers in the open
interval (0,1) instead of [0,1) as previously. It produces [0,1)
values under version control when KISS32 is used.

**New RNGs for distributions**

Newly provided are

**rexponential(***b***)**
exponential random variates with scale *b*

**rlogistic()**
logistic variates with mean 0 and standard deviation
π/sqrt(3)

**rlogistic(***s***)**
logistic variates with mean 0, scale *s*, and standard
deviation *s*π/sqrt(3)

**rlogistic(***m***,***s***)**
logistic variates with mean *m*, scale *s*, and standard
deviation *s*π/sqrt(3)

**rweibull(***a***,***b***)**
Weibull variates with shape *a* and scale *b*

**rweibull(***a***,***b***,***g***)**
Weibull variates with shape *a*, scale *b*, and location *g*

**rweibullph(***a***,***b***)**
Weibull (proportional hazards) variates with shape *a* and
scale *b*

**rweibullph(***a***,***b***,***g***)**
Weibull (proportional hazards) variates with shape *a*, scale
*b*, and location *g*

See **[FN] Random-number functions**.

**Choosing which RNG to use**

You are running Stata with version 14 set. You want values from
**rlogistic()** but based on KISS32 rather than the version-14
default of Mersenne Twister. You could type

. **version 13:** ... **rlogistic()** ...

or you can just use new function **rlogistic_kiss32()** without
resetting the version:

. ... **rlogistic_kiss32()** ...

That is, every RNG *fcn***()** comes in three flavors: *fcn***()**,
*fcn***_mt64()**, and *fcn***_kiss32()**.

Functions *fcn***_mt64()** and *fcn***_kiss32()** are now considered the true
names of the RNGs, but still, you will usually type *fcn***()**.

That is because of another new feature:

. **set rng kiss32**

**set rng kiss32** says that when you type *fcn***()**, you mean
*fcn***_kiss32()**. You can **set rng** to **kiss32**, **mt64**, or **default**. That
is how the meaning of *fcn***()** is set. **default** means the default
for the version. In version 14, the default is **mt64**. In version
13 and before, it is **kiss32**.

Programmers: Ado-file code written under previous versions of
Stata now use modern RNGs! You do not have to modify your
ado-files. That is because how **version** is set for the RNGs has
been modified. Users typing **version** at the command line or in a
do-file set RNG's version, too. Ado-files setting version,
however, do not change RNG's version! In ado-files, the RNG's
version can be set by setting the user version if you wanted to
set it, but you do not. See **[P] version**.

**Setting seeds and states**

You previously could set the seed or reset the state of the RNGs
by using **set seed**. Now, **set seed** is used solely for setting the
seed. New command **set rngstate** is used for resetting the state.
See **[R] set seed**.

You previously obtained the state of the RNGs by using **c(seed)**.
Stata continues to understand that, but officially, you are
supposed to use **c(rngstate)**.

See **[FN] Random-number functions** and **[R] set seed**.

16. **Postestimation made easy**

You have to try this. Clear your Stata's estimation results, if
any; type **discard**. Now type **postest**. A little, empty window will
pop up. Move it to where it does not overlap Stata but you can see
it. Now run an estimation command -- any estimation command. You
could type **use auto** and then type **regress mpg weight foreign**.

Isn't that neat? Well, if you are not following along, let us tell
you what just appeared in that little, empty window. This appeared:

> Marginal analysis
> Tests, contrasts, and comparisons of parameter estimates
> Specification, diagnostic, and goodness-of-fit analysis
> Diagnostic and analytic plots
> Predictions
> Other reports
> Manage estimation results

Those are the postestimation things you can do after **regress**. Use a
different estimation command and you will see a different list.
Click on a topic and it expands. That list is tailored, too.
Highlight a detailed element, click on Launch, and you are in the
dialog box for the postestimation feature and it is filled in as
much as it can be.

Enjoy.

17. **New and improved features in margins**

Existing command **margins** is used after estimation. **margins** uses the
fitted results, the data in memory, and a little bit that you type
to produce estimates of marginal effects, marginal means, predictive
margins, population-averaged effects, and least-squares means, and
presents the estimates in tables or graphs.

With **margins**, you can do what-if analyses. What would have been
observed if everyone in the data were males? Females? What would
have happened if the men in the data had their same characteristics
but were relabeled women, and the women had their same
characteristics but were relabeled men? If you can think of a
counterfactual, potential outcome, comparison, or contrast, **margins**
can do it.

We have improved **margins** in Stata 14.

**Works with multiple outcomes simultaneously**

**margins** previously restricted you to working with one outcome
at a time. No longer does it do this, and by default, **margins**
automatically produces its results for all equations,
outcomes, or ordered levels of the fitted model. If you fit a
multivariate regression on two variables, you get **margins**
results for both variables. If you fit an ordinal model, you
get results for each of the ordered levels.

Use the **predict()** option to restrict results to selected
equations, outcomes, or levels if you wish. You may now
specify multiple **predict()** options on the same **margins**
command.

**Integrates over unobserved components after multilevel and SEM**
**models**

Making predictions (even if counterfactual) is difficult with
models that contain random or latent variables which are,
after all, unobserved. **margins** now integrates over them and
gets the average. Integrating over unobserved components is
the logical counterpart of producing population-averaged
results by averaging over your data.

There is even logic for it when making predictions about
individuals; you are making average predictions for
individuals with the same characteristics. What is the
expected probability of high blood pressure for males, age 50,
weight 190, based on a random-effects logistic regression? To
compute that probability, you cannot simply take the random
effect at its known mean of 0. You must integrate over the
distribution of the random effect. **margins** does this.

**predict has these features, too**

**margins** has the new features just described, but in fact, it
inherited them from new features of **predict**. You can now use
**predict** to obtain predictions integrated over the
distributions of unobserved components, random effects, and
latent variables.

These marginal predictions are produced if you specify new
option **marginal** with **predict** and are available after **gsem**,
**mecloglog**, **meglm**, **melogit**, **menbreg**, **meologit**, **meoprobit**,
**mepoisson**, and **meprobit**.

**Better default statistics after some estimators**

**margins** now uses its own default prediction statistic rather
than the default prediction for the estimator. Sometimes, the
default for the estimator is not statistically appropriate for
use with **margins**, or there is an optional prediction statistic
that is more interesting for marginal analysis. In such
cases, **margins** now uses the most interesting or appropriate
prediction statistic by default. You can still choose any
available statistic by using the **predict()** option. The
previous default is preserved under version control. See item
36 below for the complete list of new defaults.

**margins is faster**

**margins** is now much faster when computing predictive margins
and marginal effects on predicted probabilities after **ologit**,
**oprobit**, and **mlogit**.

It is also much faster when all of a model's *indepvars* are
fixed to constants, such as when the **atmeans** option is
specified.

**margins can now add its results to your data**

**margins** has a new **generate()** option. You supply the stub of a
name, and **margins** fills in the rest. Specify the option, and
**margins** creates new variables containing
observation-by-observation values used to produce each of its
reported results.

This is an often-requested feature and is useful when you use
margins to produce a single or a small set of results.
Otherwise, you will be inundated with new variables that are
difficult to interpret.

This new, useful feature is currently "undocumented" in Stata
speak, meaning that the documentation for it is available
solely in help margins generate.

**And it is now easier to determine which features are available**

Each estimation command now documents which of the available
predicted statistics is the default statistic for **margins** and
which of the other statistics are appropriate for use with
**margins**.

See **[R] margins**.

18. **Hurdle model estimation**

New estimation command **churdle** fits linear or exponential hurdle
models. Hurdle models allow us to model censored and uncensored
outcomes separately. Uncensored outcomes are assumed to be observed
when a hurdle is cleared. Censored outcomes are a result of not
clearing the hurdle.

Hurdle models come in two- and three-equation forms. The
two-equation form handles a right- or left-censoring. The
three-equation form handles combined censoring.

Consider modeling how much people spend at the movies. We have data
on a cross-section of people and the amount they spent last month.
In our data, many people spent nothing because they did not even go
to the movies; and how much the rest spent varies. In the hurdle
model, we assume that once the decision is made to go to the movies,
the amount spent can be treated independently of the decision to go.
Those are the two equations: one for the decision to go and another
for the amount spent.

See **[R] churdle**.

19. **Censored Poisson estimation**

New command **cpoisson** fits censored Poisson regressions to count
outcomes. These are Poisson models with values that are not
observed if they are below a threshold, or they are above a
threshold, or both. The thresholds can be fixed values, such as 5
and 10, or can be recorded in variables, meaning that thresholds
vary observation by observation.

Postestimation, you can obtain predictions of the number of
uncensored events, the uncensored incidence rate, the uncensored
probability of a particular value or range of values, and the
expected conditional probabilities of a value or range conditional
on being within the censoring limits.

See **[R] cpoisson**.

20. **Support for ICD-10 medical diagnosis codes**

New command **icd10** joins existing commands **icd9** and **icd9p**.

New command **icd10** provides automatic mapping of the World Health
Organization's ICD-10 diagnosis codes for mortality and morbidity,
and it adds some features not previously provided with **icd9** and
**icd9p**. It allows a version control that sets descriptions to those
that were current at the time your data were recorded, it works with
a subset or with all records of your data, and it can be used with
category or subcategory codes.

**icd10** allows you to do the same things you could do with **icd9** and
**icd9p**; namely, you can standardize the format of codes in your data,
confirm that codes are defined, verify that codes are formatted
correctly, and easily create indicators for the presence of
different conditions.

Meanwhile, existing commands **icd9** and **icd9p** (for ICD-9, of course)
provide the same new features, where applicable, provided by **icd10**.
These new features include **if** *exp* and **in** *range* being allowed, and
more.

See **[D] icd10** and **[D] icd9**.

We would like to thank the World Health Organization for making
these codes available to Stata users. See copyright icd10 for
allowed usage.

21. **Excel reports get better**

One of the more popular features of Stata 13 was the **putexcel**
command for exporting results to Excel. All it did, however, was
allow you to poke numbers and strings into Excel worksheets. Even
so, a Stata blog entry about it was our most popular of the release!

So we have added to **putexcel**. In addition to previous features, you
can now

o insert Stata graphs

o insert text and format it with alignment, boldface, italics,
color, and more

o specify Excel formats, including date formats, currency
formats, and more

o make better tables with cell spanning, border formatting, and
more

o insert Excel formulas

All the things you can do from **putexcel** you can also do from Mata's
**xl()** Excel file I/O class.

See **[P] putexcel** and **[M-5] xl()**.

22. **Manual entries now have Quick starts**

Have you ever looked at a Stata command for the first time and
wanted to see some examples without explanation that do something
interesting? Or have you ever needed a quick refresher on a
command's most common syntaxes?

If so, what you want is now at the top of the command's manual
entry. We call them Quick starts. We show a few examples for
simple commands and sometimes more than a few when the command's
syntax is more complex. If a command involves several steps, those
steps are shown, too.

The Quick starts are located right below the Description, and the
Description now appears below the Title, where it always should have
been.

Quick starts do not appear in the help files, but they are just a
click away. Click on the blue command name in the Title. In the
help files, we keep Syntax near the top so that experts who just
need the facts can see a quick refresher.

23. **Stata in Spanish and Japanese**

It is called localization when a software's menus, dialogs, and the
like are translated into other languages. We have completed
localization of Stata for Spanish and Japanese. Manuals and help
files remain in English.

If your computer is set to a specific language, and that language is
Spanish or Japanese, Stata will recognize this and automatically use
that language. To manually change the language, select **Edit >**
**Preferences > User-interface language...** (Windows and Unix), or
select **Stata 14 > Preferences > User-interface language...** (Mac).
Alternatively, you can change the language from the command line:
see **[P] set locale_ui**.

StataCorp translated to Spanish, and StataCorp gratefully
acknowledges the efforts of LightStone Corporation, Stata's official
distributor in Japan, for translating to Japanese.

__1.3.2 What's new in statistics (general)__

Already mentioned as highlights of the release were the following:

Bayesian statistical analysis
Regression models for fractional data
Hurdle model estimation
Censored Poisson estimation
New random-number generators (RNGs)
New and improved features in margins

The following are also new:

24. New commands **ztest** and **ztesti** compare means in one or two samples
using a *z* test and assuming known variances. With two samples,
**ztest** supports both paired and unpaired data. **ztesti** is the
immediate form of **ztest** that allows you to perform a test by
typing summary statistics rather than using a dataset. See **[R]**
**ztest**.

25. Almost every estimator now supports factor variables. New to
this list in Stata 14 are **asmprobit**, **asroprobit**, **asclogit**,
**nlogit**, **gmm**, and **mlexp**.

26. Existing estimation command **nlogittree** now reports when any
observations violate the specified nesting structure of the model
and will result in **nlogit** dropping observations or terminating
with an error. See **[R] nlogit**.

27. Existing commands **test** and **testparm** provide new option **df(***#***)** to
specify that the F distribution, rather than the default
chi-squared distribution, be used when performing the Wald test.

28. When used with survey data, existing command **testparm** provides
new option **nosvyadjust**, which specifies that the Wald test is to
be performed without the default adjustment for the design
degrees of freedom.

29. Existing command **cumul** now labels the generated variable. The
label is "ECDF of *varname*".

30. Existing command **ksmirnov** no longer reports the corrected
p-value. The "exact" p-values for the one-sample and two-sample
tests are based on the asymptotic limiting distribution of the
test statistic and involve infinite series. By default, Stata
reports an approximate p-value computed using the first five
terms of the series. The corrected p-value was obtained by
applying an ad hoc correction to the approximate p-value to make
it closer to the exact p-value. In recent simulation studies, we
found that this correction does not perform satisfactorily in all
situations and is thus no longer supported. The old results can
be obtained under version control. For a two-sample test, you
can use the **exact** option to obtain the exact p-value. See **[R]**
**ksmirnov**.

31. Existing command **tabulate** now has new options **rowsort** and
**colsort**, which specify that the rows (columns) of a two-way
tabulation be presented in order of observed frequency. See **[R]**
**tabulate twoway**.

32. Existing estimation commands **mprobit** and **mlogit** with constraints
defined using equation indices rather than the equation names
would apply the constraints without accounting for the base
outcome equation. For example,

. **sysuse auto**
. **constraint 1 [#2]turn = [#2]trunk**
. **mprobit rep78 turn trunk, baseoutcome(1) constraint(1)**

would result in [3]turn = [3]trunk instead of [2]turn = [2]trunk.
Now **mprobit** accounts for the base outcome equation when applying
such constraints. The old behavior is preserved under version
control. Either behavior is truly reasonable, but the new
behavior is more consistent with how these commands report their
results.

33. Existing estimation commands **mprobit** and **mlogit** did not respect
constraints that had been defined using value labels associated
with the levels of the outcome variable. For example,

. **label define replab 1 "A" 2 "B" 3 "C" 4 "D" 5 "E"**
. **label values rep78 replab**
. **constraint 1 [2]turn = [2]trunk**
. **mprobit rep78 turn trunk, baseoutcome(1) constraint(1)**

would drop constraint 1. Now **mprobit** and **mlogit** work with
constraints specified in this way.

You can also now refer to the estimated coefficients using the
equation name or the outcome value label in expressions. Using
the above example, the following is now allowed:

. **test [3]turn = [3]trunk**

The old behavior disallowing these two behaviors is preserved
under version control.

34. Existing command **nptrend** now displays value labels in its output.
New option **nolabel** specifies that numerical codes be displayed
rather than value labels. See **[R] nptrend**.

35. Existing postestimation commands **margins** and **marginsplot** have
many new features. Some were mentioned in the highlights:

Works with multiple outcomes simultaneously
Integrates over unobserved components after multilevel and SEM
models
Better default statistics after some estimators
margins is faster
margins can now add its results to your data
And it is now easier to determine which features are available

36. **margins** now defaults to prediction statistics different from the
default for **predict** when **predict**'s default is not the most
interesting statistic for marginal analysis or when that default
is not statistically appropriate for marginal analysis.

**margins** also defaults to producing statistics for all equations,
outcomes, or levels when possible. The original behavior is
preserved under version control. See the highlight *Better*
*default statistics after some estimators*.

The following estimators have new defaults:

Command New default statistic
------------------------------------------------------------
**clogit** probability assuming fixed effect is zero
**gsem** expected values for each outcome
**heckoprobit** marginal probabilities for each outcome
**manova** linear predictions for each equation
**meologit** probabilities for each outcome
**meoprobit** probabilities for each outcome
**meqrlogit** linear prediction
**meqrpoisson** linear prediction
**mgarch ccc** linear predictions for each equation
**mgarch dcc** linear predictions for each equation
**mgarch dvech** linear predictions for each equation
**mgarch vcc** linear predictions for each equation
**mlogit** probabilities for each outcome
**mprobit** probabilities for each outcome
**mvreg** linear predictions for each equation
**ologit** probabilities for each outcome
**oprobit** probabilities for each outcome
**reg3** linear predictions for each equation
**rologit** linear prediction
**sem** linear predictions for each observed
endogenous variable
**slogit** probabilities for each outcome
**sureg** linear predictions for each equation
**varbasic** linear predictions for each equation
**var** linear predictions for each equation
**vec** linear predictions for each equation
**xtlogit, fe** probability assuming fixed effect is zero
------------------------------------------------------------

For multilevel model estimators and **gsem**, the default prediction
statistics are not statistically appropriate with **margins**. These
estimators now support marginal predictions, which are highly
interpretable when used with **margins**. These marginal means and
probabilities are now the default statistics produced by **margins**
after **gsem**, **mecloglog**, **meglm**, **melogit**, **menbreg**, **meologit**,
**meoprobit**, **mepoisson**, and **meprobit**.

37. **margins, contrast()** has new suboptions to support multiple
**predict()** options. Also see the highlight *Works with multiple*
*outcomes simultaneously*.

38. **margins** after **mixed**, when the model specification includes
multilevel weights, now uses the product of the multilevel
weights when computing the means and margins. The previous
behavior of using only the observation-level weights is preserved
under version control.

39. Support for multilevel weights has been added in Stata 14 to
several estimation commands: **gsem**, **mecloglog**, **meglm**, **melogit**,
**menbreg**, **meologit**, **meoprobit**, **mepoisson**, and **meprobit**. **margins**
also supports multilevel weights for all of these commands.

**margins** uses the product of the multilevel weights when computing
means and margins.

40. **marginsplot** now supports the new multiple **predict()** options
allowed on **margins**. It also automatically handles the new
default of multiple results that **margins** produces with
multivariate, multinomial, and ordinal estimators. You can
customize how these plot, equation, and outcome dimensions are
graphed using the new directives **_predict**, **_equation**, and
**_outcome** in the existing options **xdimension()**, **plotdimension()**,
**bydimension()**, and **graphdimension()**. See **[R] marginsplot**.

__1.3.3 What's new in statistics (SEM)__

Already mentioned as highlights of the release were the following:

Survival models
Satorra-Bentler scaled chi-squared test
Support for survey data
Beta distribution
Multilevel weights
Prediction improvements

The following are also new:

41. Existing postestimation command **predict** after **sem** now has the
**scores** option for predicting parameter-level scores. See **[SEM]**
**predict after sem**.

42. Existing estimation command **sem** now reports information about
each dependent variable in the header of the estimation table.

43. Existing estimation command **sem**'s starting-values logic for
**startvalues(fixedonly)** and for those built on it now considers
all constraints on the fixed-effects parameters. This improves
convergence for some models.

__1.3.4 What's new in statistics (multilevel modeling)__

Already mentioned as highlights of the release were the following:

Multilevel mixed-effects parametric survival models
Small-sample inference for linear multilevel mixed models
Survey support and multilevel weights for multilevel models

The following are also new:

44. Existing postestimation command **predict** supports new options
after the following me estimators, nearly all multilevel: **meglm**,
**melogit**, **meprobit**, **mecloglog**, **meologit**, **meoprobit**, **mepoisson**, and
**menbreg**.

**predict** supports new options **density** and **distribution** for
computing the density and distribution function of the fitted
model using the current values and the estimated parameters.

**predict** supports new option **scores** for predicting parameter-level
scores.

45. Existing me estimation commands now support **iweight**s in the
fixed-effects and random-effects equations.

__1.3.5 What's new in statistics (treatment effects)__

Already mentioned as highlights of the release were the following:

Treatment effects for survival models
Endogenous treatments
Probability weights
Balance analysis

The following are also new:

46. Existing estimation command **etregress** has new features, an
improvement, and a change.

New option **cfunction** specifies that the model be estimated using
control function rather than the previously available GMM and
maximum likelihood estimators. See **[TE] etregress**.

New option **poutcomes** specifies that a potential-outcome model be
fit with separate standard deviation and correlation parameters
in the treatment and control regimes.

**etregress** is now faster.

The labeling of the coefficients has changed such that that the
treatment is represented in factor variable notation. The old
labeling is maintained under version control.

See **[TE] etregress**.

__1.3.6 What's new in statistics (longitudinal/panel data)__

Already mentioned as a highlight of the release was the following:

Panel-data survival models

The following are also new:

47. Three estimators add the **vce(robust)** and **vce(cluster ...)** options
to compute standard errors that are robust to distributional
assumptions and correlated data. This new support is provided
for **xthtaylor**, **xtivreg**, and the random-effects estimator of
**xtpoisson**. (**xtpoisson** previously supported the options for
Gaussian-distributed random effects but not for the default gamma
distribution.)

See **[XT] xthtaylor**, **[XT] xtivreg**, and **[XT] xtpoisson**.

48. Existing estimation commands **xtologit** and **xtoprobit** now support
weights -- frequency weights (**fweight**s), sampling weights
(**pweight**s), and importance weights (**iweight**s). See **[XT] xtologit**
and **[XT] xtoprobit**.

49. Existing estimation command **xtreg, fe** is now orders of magnitude
faster when there are many panels, and there always are.

__1.3.7 What's new in statistics (time series)__

Everything new was mentioned in the following highlights:

Markov-switching regression models
Tests for structural breaks in time-series data

__1.3.8 What's new in statistics (survival analysis)__

Already mentioned as highlights of the release were the following:

Multilevel mixed-effects parametric survival models
Small-sample inference for linear multilevel mixed models
Survey support and multilevel weights for multilevel models

The following are also new:

50. Existing command **stcurve** has new option **marginal**, which is a
synonym for option **unconditional**; see **[ST] streg postestimation**.

51. Existing estimation command **stcox** now allows factor variables in
option **tvc()**; see **[ST] stcox**.

52. System variables created by command **stset** -- **_st**, **_d**, **_t0**, **_t**,
and **_origin** -- are now labeled.

53. Variables generated by existing command **sttocc** are now labeled.

54. Existing estimation command **streg**'s option **distribution(gamma)**
was renamed to **distribution(ggamma)**. **gamma** continues to be
supported under version control.

This distribution should always have been designated **ggamma**
because it is one of the generalized gamma distributions. It is
renamed now to avoid confusion with the **gamma** argument allowed
with the new option **distribution(gamma)** allowed on the **mestreg**
command and on the **xtstreg** command and with the new option
**family(gamma)** allowed on the **gsem** command.

__1.3.9 What's new in statistics (survey data)__

Already mentioned as highlights of the release were the following:

Survey support and multilevel weights for multilevel models
Support for survey data (SEM)

The following are also new:

55. Existing command **svyset** has a new syntax for specifying
stage-level sampling weight variables. New syntax supports
commands such as **gsem** and **meglm** that can fit hierarchical
multilevel models with group-level weights. See **[SVY] svyset**,
and for examples of fitting a multilevel model with stage-level
sampling weights, see examples 5 and 6 in **[ME] meglm**.

56. The existing prefix command **svy jackknife:** with **svyset** replicate
weight variables now uses the specified multiplier values as
**svyset** in the **jkrweight()** option, even for unit-valued
multipliers. The old behavior where **svy** **jackknife** used the
default delete-1 multiplier instead of unit-valued multipliers is
preserved under version control.

__1.3.10 What's new in statistics (power and sample size)__

Already mentioned as a highlight of the release was the following:

Power analysis for survival and epidemiological methods

The following are also new:

57. Existing command **power** now displays an estimated target variance
in addition to the effect size when computing effect size for the
analysis of variance and covariance methods. See **[PSS] power**
**oneway**, **[PSS] power twoway**, and **[PSS] power repeated**.

__1.3.11 What's new in statistics (multiple imputation)__

58. Existing command **mi impute pmm** now requires the specification of
the number of nearest neighbors in the **knn()** option. Before, **mi**
**impute pmm** used the default of one nearest neighbor, **knn(1)**.

Recent simulation studies demonstrated that using one nearest
neighbor performed poorly in many of the considered scenarios.
In general, the optimal number of nearest neighbors varies from
one application to another. Thus **mi impute pmm** now requires that
the **knn()** option be specified. See **[MI] mi impute pmm** for
details.

This change will also affect **mi impute monotone** and **mi impute**
**chained** when the **pmm** method is used in the conditional
specifications. The old behavior of the commands is available
under version control.

59. **mi** now requires that the names of imputation and passive
variables not exceed 29 characters. In the wide style, the names
of these variables may be restricted to fewer than 29 characters
depending on the number of imputations. In the flongsep style,
the names of regular variables in addition to the names of
imputation and passive variables also may not exceed 29
characters. These requirements are imposed by the internal
structure of the **mi** command. The affected commands are

**mi convert**
**mi import**
**mi passive**
**mi register**
**mi rename**
**mi set M**

60. **mi** supports new estimation command **fracreg**. See **[R] fracreg**.

__1.3.12 What's new in statistics (multivariate)__

61. **margins** used after existing estimation commands **manova** and **mvreg**
now defaults to reporting linear predictions for all equations.
Previous behavior was to default to reporting linear predictions
for only the first equation.

__1.3.13 What's new in data management__

Already mentioned as highlights of the release were the following:

Unicode support
More than 2 billion observations now allowed
Support for ICD-10 medical procedure codes
Excel reports get better

The following are also new:

62. Stata's .dta dataset file format has changed. The change was
unavoidable because of Stata 14's new Unicode features and the
increase in the allowed maximum number of observations. **use**
continues to read old-format files, of course. **save** writes
new-format files. Use **saveold** when sharing datasets with users
of previous versions of Stata.

Few will be interested, but those who are should see **dta** for the
technical specification of the new dataset format.

63. Existing command **saveold** has a new **version()** option and the
corresponding ability to save files not only in the format of the
previous Stata release, but in the formats of older releases,
too. **saveold** can save data in the formats of Stata 11, 12, and
13. See **saveold** in **[D] save**.

64. Existing commands **icd9** and **icd9p** have new features. This was
mentioned in passing in the highlights of the new **icd10** command.
See **[D] icd9**.

65. Existing command **destring** now handles Unicode. **destring**'s option
**ignore()** has new suboptions **asbytes** and **illegal** in support of
Unicode, but you are unlikely ever to want to specify either of
these suboptions.

New suboption **asbytes** specifies that multibyte sequences be
treated as nothing more than a series of bytes, meaning strings
may contain untranslated Extended ASCII characters. New
suboption **illegal** specifies that multibyte characters that do not
make sense to Unicode are to be ignored.

See **[D] destring**.

66. Existing command **import delimited** has new option
**encoding("***encoding***")**. This option allows **import delimited** to
import into Stata files that contain Extended ASCII strings. The
Extended ASCII characters are translated into Unicode. In
addition, **import delimited**'s performance has been improved. See
**[D] import delimited**.

67. Existing command **generate** has new options **before()** and **after()** so
that the newly created variable be placed **before(***existing_var***)** or
**after(***existing_var***)**. See **[D] generate**.

68. New command **insobs** inserts new, empty observations into the
dataset. Empty means that the variables in the new observations
contain missing. See **[D] insobs**.

__1.3.14 What's new in functions__

Already mentioned as highlights of the release were the following:

Uniformly distributed RNGs in specified intervals
New RNGs for distributions

It is also worth mentioning that Stata has a new *Functions Reference*
*Manual* dedicated solely to the documentation of functions.

The following new functions are added to Stata and Mata:

69. New function **strrpos(***s1***,***s2***)** returns the position of the last
occurrence of *s2* in *s1*. That is to say, **strrpos()** searches
backward from the right. See **[FN] String functions**.

70. New string functions **u***fcn***()** and **ud***fcn***()** corresponding to each
string function *fcn***()** are added. These functions work in the
Unicode metric as described in highlights.

The following **u***fcn***()**s are added:

New function Corresponding to
------------------------------------------
**ustrlen()** **strlen()**
**usubstr()** **substr()**
**usubinstr()** **subinstr()**
**ustrpos()** **strpos()**
**ustrrpos()** **strrpos()**

**ustrlower()** **strlower()**
**ustrupper()** **strupper()**
**ustrtitle()** **strproper()**

**ustrltrim()** **strltrim()**
**ustrrtrim()** **strrtrim()**
**ustrtrim()** **strtrim()**

**ustrregexm()** **regexm()**
**ustrregexra()** **regexr()**
**ustrregexrf()** **regexr()** [*sic*]
**ustrregexs()** **regexs()**

**ustrreverse()** **strreverse()**
**uchar()** **char()**
**ustrtoname()** **strtoname()**
**ustrword()** **word()**
**ustrwordcount()** **wordcount()**
------------------------------------------

The following **ud***fcn***()**s are added:

New function Corresponding to
------------------------------------------
**udstrlen()** **strlen()**
**udsubstr()** **substr()**
------------------------------------------

See **[FN] String functions**.

71. New Unicode string functions **ustrcompare()**, **ustrcompareex()**,
**ustrsortkey()**, and **ustrsortkeyex()** compare and sort Unicode
strings in a locale-aware manner, for instance, by generating a
key for use with the **sort** command. See **[U] 12.4.2.5 Sorting**
**strings containing Unicode characters**.

72. New Unicode string functions **ustrfrom()** and **ustrto()** convert
strings between UTF-8 and Extended ASCII encodings. When
converting from Extended ASCII to UTF-8, that is, when using
**ustrfrom()**, you may want to first use new function
**ustrinvalidcnt()**, which counts the number of character sequences
not already UTF-8. **ustrinvalidcnt()** indicates that a string
needs conversion.

73. New Unicode string functions **ustrunescape()** and **ustrtohex()**
translate escape sequences to and from Unicode. Escape sequences
look like **\u00e8**, which is the escape sequence for "è". Some
websites write this as U+00E8. If you wanted "è" and did not
know how to type it, you could type **ustrunescape("\u00e8")**. The
function would return **"è"**.

If you wanted to know the escape sequence for "è", you could use
function **ustrtohex("è")** and get back the string **"\u00e8"**.

74. New Unicode string function **tobytes()** returns the byte values of
a Unicode string. For instance, **ustrtohex("è")** returns
**"\d195\d168"** because the UTF-8 encoded form of "è" is two bytes
long. The byte value is 195 followed by 168, written in decimal
form.

75. New Unicode string functions **ustrleft()** and **ustrright()** are most
easily understood in terms of **usubstr()**. **ustrleft(***s***,** *#***)** is equal
to **substr(***s***, 1,** *#***)**. and **ustrright(***s***,** *#***)** is equal to **substr(***s***,**
**-***#***, .)**.

76. The following new Unicode string functions are rarely used and
highly technical:

**uisdigit()**
**uisletter()**
**ustrfix()**
**ustrnormalize()**
**wordbreaklocale()**
**collatorlocale()**
**collatorversion()**

See **[FN] String functions**.

77. There are many new random-number functions. The new functions
are

**runiform()**
**runiformint()**
**rexponential()**
**rlogistic()**
**rweibull()**
**rweibullph()**

See **[FN] Random-number functions**.

78. A new family of functions computes probabilities and other
quantities of the logistic distribution.

**logistic(***x***)** computes the cumulative distribution function of the
logistic distribution with mean 0 and standard deviation
π/sqrt(3).

**logistic(***s***,***x***)** computes the cumulative distribution function of a
logistic distribution with mean 0, scale *s*, and standard
deviation *s*π/sqrt(3).

**logistic(***m***,***s***,***x***)** computes the cumulative distribution function of
a logistic distribution with mean *m*, scale *s*, and standard
deviation *s*π/sqrt(3).

**logisticden(***x***)** computes the density of the logistic distribution
with mean 0 and standard deviation π/sqrt(3).

**logisticden(***s***,***x***)** computes the density of the logistic
distribution with mean 0, scale *s*, and standard deviation
*s*π/sqrt(3).

**logisticden(***m***,***s***,***x***)** computes the density of the logistic
distribution with mean *m*, scale *s*, and standard deviation
*s*π/sqrt(3).

**logistictail(***x***)** computes the reverse cumulative logistic
distribution with mean 0 and standard deviation π/sqrt(3).

**logistictail(***s***,***x***)** computes the reverse cumulative logistic
distribution with mean 0, scale *s*, and standard deviation
*s*π/sqrt(3).

**logistictail(***m***,***s***,***x***)** computes the reverse cumulative logistic
distribution with mean *m*, scale *s*, and standard deviation
π/sqrt(3).

**invlogistic(***p***)** computes the inverse cumulative logistic
distribution: if **logistic(***x***)** = *p*, then **invlogistic(***p***)** = *x*.

**invlogistic(***s***,***p***)** computes the inverse cumulative logistic
distribution: if **logistic(***s***,***x***)** = *p*, then **invlogistic(***s***,***p***)** = *x*.

**invlogistic(***m***,***s***,***p***)** computes the inverse cumulative logistic
distribution: if **logistic(***m***,***s***,***x***)** = *p*, then **invlogistic(***m***,***s***,***p***)** =
*x*.

**invlogistictail(***p***)** computes the inverse reverse cumulative
logistic distribution: if **logistictail(***x***)** = *p*, then
**invlogistictail(***p***)** = *x*.

**invlogistictail(***s***,***p***)** computes the inverse reverse cumulative
logistic distribution: if **logistictail(***s***,***x***)** = *p*, then
**invlogistictail(***s***,***p***)** = *x*.

**invlogistictail(***m***,***s***,***p***)** computes the inverse reverse cumulative
logistic distribution: if **logistictail(***m***,***s***,***x***)** = *p*, then
**invlogistictail(***m***,***s***,***p***)** = *x*.

**rlogistic()** computes logistic variates with mean 0 and standard
deviation π/sqrt(3).

**rlogistic(***s***)** computes logistic variates with mean 0, scale *s*, and
standard deviation *s*π/sqrt(3).

**rlogistic(***m***,***s***)** computes logistic variates with mean *m*, scale *s*,
and standard deviation *s*π/sqrt(3).

See **[FN] Statistical functions** and **[FN] Random-number functions**.

79. A new family of functions computes probabilities and other
quantities of the Weibull distribution.

**weibull(***a***,***b***,***x***)** computes the cumulative distribution function of a
Weibull distribution with shape *a* and scale *b*. **weibull(***a***,***b***,***x***)** =
**weibull(***a***,***b***,***0***,***x***)**.

**weibull(***a***,***b***,***g***,***x***)** computes the cumulative distribution function of
a Weibull distribution with shape *a*, scale *b*, and location *g*.

**weibullden(***a***,***b***,***x***)** computes the density of the Weibull
distribution with shape *a* and scale *b*. **weibullden(***a***,***b***,***x***)** =
**weibullden(***a***,***b***,***0***,***x***)**.

**weibullden(***a***,***b***,***g***,***x***)** computes the density of the Weibull
distribution with shape *a*, scale *b*, and location *g*.

**weibulltail(***a***,***b***,***x***)** computes the reverse cumulative Weibull
distribution with shape *a* and scale *b*. **weibulltail(***a***,***b***,***x***)** =
**weibulltail(***a***,***b***,***0***,***x***)**.

**weibulltail(***a***,***b***,***g***,***x***)** computes the reverse cumulative Weibull
distribution with shape *a*, scale *b*, and location *g*.

**invweibull(***a***,***b***,***p***)** computes the inverse cumulative Weibull
distribution: if **weibull(***a***,***b***,***x***)** = *p*, then **invweibull(***a***,***b***,***p***)** = *x*.

**invweibull(***a***,***b***,***g***,***p***)** computes the inverse cumulative Weibull
distribution: if **weibull(***a***,***b***,***g***,***x***)** = *p*, then **invweibull(***a***,***b***,***g***,***p***)** =
*x*.

**invweibulltail(***a***,***b***,***p***)** computes the inverse reverse cumulative
Weibull distribution: if **weibulltail(***a***,***b***,***x***)** = *p*, then
**invweibulltail(***a***,***b***,***p***)** = *x*.

**invweibulltail(***a***,***b***,***g***,***p***)** computes the inverse reverse cumulative
Weibull distribution: if **weibulltail(***a***,***b***,***g***,***x***)** = *p*, then
**invweibulltail(***a***,***b***,***g***,***p***)** = *x*.

**rweibull(***a***,***b***)** computes Weibull variates with shape *a* and scale *b*.

**rweibull(***a***,***b***,***g***)** computes Weibull variates with shape *a*, scale *b*,
and location *g*.

See **[FN] Statistical functions** and **[FN] Random-number functions**.

80. A new family of functions computes probabilities and other
quantities of the Weibull distribution (proportional hazards).

**weibullph(***a***,***b***,***x***)** computes the cumulative distribution function of
a Weibull distribution (proportional hazards) with shape *a* and
scale *b*. **weibullph(***a***,***b***,***x***)** = **weibullph(***a***,***b***,***0***,***x***)**.

**weibullph(***a***,***b***,***g***,***x***)** computes the cumulative distribution function
of a Weibull distribution (proportional hazards) with shape *a*,
scale *b*, and location *g*.

**weibullphden(***a***,***b***,***x***)** computes the density of the Weibull
distribution (proportional hazards) with shape *a* and scale *b*.
**weibullphden(***a***,***b***,***x***)** = **weibullphden(***a***,***b***,***0***,***x***)**.

**weibullphden(***a***,***b***,***g***,***x***)** computes the density of the Weibull
distribution (proportional hazards) with shape *a*, scale *b*, and
location *g*.

**weibullphtail(***a***,***b***,***x***)** computes the reverse cumulative Weibull
distribution (proportional hazards) with shape *a* and scale *b*.
**weibullphtail(***a***,***b***,***x***)** = **weibullphtail(***a***,***b***,***0***,***x***)**.

**weibullphtail(***a***,***b***,***g***,***x***)** computes the reverse cumulative Weibull
distribution (proportional hazards) with shape *a*, scale *b*, and
location *g*.

**invweibullph(***a***,***b***,***p***)** computes the inverse cumulative Weibull
distribution (proportional hazards): if **weibullph(***a***,***b***,***x***)** = *p*,
then **invweibullph(***a***,***b***,***p***)** = *x*.

**invweibullph(***a***,***b***,***g***,***p***)** computes the inverse cumulative Weibull
distribution (proportional hazards): if **weibullph(***a***,***b***,***g***,***x***)** = *p*,
then **invweibullph(***a***,***b***,***g***,***p***)** = *x*.

**invweibullphtail(***a***,***b***,***p***)** computes the inverse reverse cumulative
Weibull distribution (proportional hazards): if
**weibullphtail(***a***,***b***,***x***)** = *p*, then **invweibullphtail(***a***,***b***,***p***)** = *x*.

**invweibullphtail(***a***,***b***,***g***,***p***)** computes the inverse reverse cumulative
Weibull distribution (proportional hazards): if
**weibullphtail(***a***,***b***,***g***,***x***)** = *p*, then **invweibullphtail(***a***,***b***,***g***,***p***)** = *x*.

**rweibullph(***a***,***b***)** computes Weibull (proportional hazards) variates
with shape *a* and scale *b*.

**rweibullph(***a***,***b***,***g***)** computes Weibull (proportional hazards)
variates with shape *a*, scale *b*, and location *g*.

See **[FN] Statistical functions** and **[FN] Random-number functions**.

81. A new family of functions computes probabilities and other
quantities of the exponential distribution.

**exponential(***b***,***x***)** computes the cumulative distribution function of
an exponential distribution with scale *b*.

**exponentialden(***b***,***x***)** computes the density of the exponential
distribution with scale *b*.

**exponentialtail(***b***,***x***)** computes the reverse cumulative exponential
distribution with scale *b*.

**invexponential(***b***,***p***)** computes the inverse cumulative exponential
distribution: if **exponential(***b***,***x***)** = *p*, then **invexponential(***b***,***p***)** =
*x*.

**invexponentialtail(***b***,***p***)** computes the inverse reverse cumulative
exponential distribution: if **exponentialtail(***b***,***x***)** = *p*, then
**invexponentialtail(***b***,***p***)** = *x*.

**rexponential(***b***)** computes exponential variates with scale *b*.

See **[FN] Statistical functions** and **[FN] Random-number functions**.

82. The following statistical functions are added:

**invnt(***df***,***np***,***p***)** computes inverse cumulative noncentral Student's t
distribution.

**invnF(***df1***,***df2***,***np***,***p***)** computes inverse cumulative noncentral F
distribution.

**lnwishartden(***df***,***V***,***X***)** computes natural logarithm of the density of
the Wishart distribution.

**lniwishartden(***df***,***V***,***X***)** computes natural logarithm of the density
of the inverse Wishart distribution.

**lnmvnormalden(***M***,***V***,***X***)** computes natural logarithm of the
multivariate normal density.

**lnigammaden(***a***,***b***,***x***)** computes natural logarithm of the inverse
gamma density.

See **[FN] Statistical functions**.

83. The string functions, random-number functions, and statistical
functions added to Stata were also added to Mata.

See **[M-4] string**, **[M-5] runiform()**, and **[M-5] normal()**.

__1.3.15 What's new in graphics__

84. New commands **graph replay** and **graph close** join improved existing
command **graph drop** to form a useful suite. All accept a graph
name, a list of graph names, **_all**, and graph names with
wildcards.

**graph replay** redisplays graphs.

**graph close** closes graph windows.

**graph drop** drops graphs (and closes their window if they have one
open).

See **[G-2] graph replay**, **[G-2] graph close**, and **[G-2] graph drop**.

85. Existing command **histogram** now allows bin size to be recalculated
in each category when **by** is specified; specify the new option
**binrescale**. See **[R] histogram**.

86. Existing command **twoway kdensity** can now estimate the density one
bandwidth beyond the maximum and minimum values of the dependent
variable; specify new option **boundary**.

87. New graph commands for the new IRT models are provided:

**irtgraph icc** graphs item characteristic curves.

**irtgraph tcc** graphs test characteristic curves.

**irtgraph iif** graphs the item information function.

**irtgraph tif** graphs the test information function.

See **[IRT] irtgraph icc**, **[IRT] irtgraph tcc**, **[IRT] irtgraph iif**,
and **[IRT] irtgraph tif**.

88. **bayesgraph** graphs summaries and convergence diagnostics for
simulated posterior distributions (MCMC samples) of model
parameters and functions of model parameters obtained from new
command **bayesmh**. Graphical summaries include trace plots,
autocorrelation plots, and various distributional plots. See
**[BAYES] bayesgraph**.

__1.3.16 What's new in Mata__

89. All the new functions added to Stata have also been added to
Mata. See *1.3.14 What's new in functions* above.

90. Mata's file commands can now read, write, and seek with files
that are longer than 2GB on 64-bit systems. See **[M-5] fopen()**.

91. The default size of the Mata cache has been increased from 400 to
2,000 kilobytes. See **[M-3] mata set**.

92. Existing **xl()** Excel file I/O class has been extended beyond
inserting text to include formatting of text, alignment,
boldface, color, italics, and the like; inserting Stata graphs;
specifying Excel formats including date formats, currency
formats, etc.; cell spanning and table border formatting; and
inserting Excel formulas. See the highlight *Excel reports get*
*better* and see **[M-5] xl()**.

93. New **PdfDocument()** class has creates PDF files from scratch in a
programmatic way. See **[M-5] Pdf*()**.

__1.3.17 What's new in programming__

94. Command **version** has new option **user**. This option causes **version**
to backdate the random-number generators (RNGs). The new RNGs
are a highlight of Stata 14. As we explained, the **version**
command has become more sophisticated. Seemingly like magic,
Stata chooses an RNG according to what the user has specified and
ignores the version numbers specified in intermediary ado-files.
If the user is running under version 14, it does not matter if
the ado-file was written for Stata 12. Any **runiform()** function
in it is given the same interpretation as in Stata 14.

This is because Stata is tracking a second version number along
with the first. The second is known as the user version and is
set only when the **version** command is given interactively or in
do-files. **version** given here sets both version numbers. In your
ado-file, **version** sets only the first version number. If you
want to set the second version number in your ado-file, you add a
second **version** line with option **user**.

Official guidelines are that you should not do this, or at least,
not do this without warning the user that your command does not
honor the implicit RNG setting via the user setting the version
number.

95. **creturn()** reports two new system settings, **c(rng)** and
**(rng_current)**.

**c(rng)** corresponds to the setting of **set rng**, which will usually
be the string **default** but could be **mt64** or **kiss32**. If it is
**default**, the RNGs in effect on based on **version**, as described
above.

**c(rng_current)** reports the RNGs in effect -- the RNGs that would
be used by **runiform()** if it were given now -- regardless of how
that was set or determined. Its values are **mt64** or **kiss32**.

96. Stata's dialog programming language now provides a **TREEVIEW** input
control. See *3.6.17 TREEVIEW tree input control* in **[P] dialog**
**programming**.

97. New command **set locale_functions** *localename* resets the default
locale used by new Unicode string functions, which take an
optional locale argument. **set** **locale_functions** is automatically
set to **default**, which means the operating system's recorded
locale. See **[P] set locale_functions**.

98. Two new extended macro functions are provided.

**:ustrlen** is the macro equivalent of the new **ustrlen()** function.

**:udstrlen** is the macro equivalent of the new **udstrlen()** function.

See **[P] macro**.

__1.3.18 What's new in the Stata interface__

Already mentioned as highlights of the release were the following:

Postestimation made easy
Manual entries now have Quick starts
Stata in Spanish and Japanese

The following are also new:

99. The Data Editor has three new features.

You can now use Find in the Data Editor to search.

You can now insert variables or observations in the middle of
your data.

You can print the entire dataset or a selection.

100. The Variable Manager now supports printing. You can print the
entire list of variables or a selection.

101. You can now export graphs and results to PDF files in Stata for
Unix. You could always do this with Windows or Mac.

102. The PDF export engine in Stata for Windows is all new and
provides support for the new Unicode features in Stata.

__1.3.19 What's more__

We have not listed all the changes, but we have listed the important
ones.

Stata is continually being updated. All between-release updates are
available for free over the Internet.

Type **update query** and follow the instructions.

We hope that you enjoy Stata 14.

-------- **previous updates** -----------------------------------------------------

See whatsnew13.

-------------------------------------------------------------------------------