This page contains only historical information and is not about the current
release of Stata.
Please see our Stata 11 page
for information on the current version of Stata.
Data management
-
Stata 10 has new date/time variables, so you can now record values like
14jun2007 09:42:41.106 in one variable. They are called
%tc and %tC variables.
The first is unadjusted for leap seconds; the second is adjusted.
What used to be called “daily variables” are now called
%td variables. This is just a jargon change;
daily (%td) variables continue to work as they
did before—0 means 01jan1960, 1 means 02jan1960, and so on.
%tc and %tC variables
work similarly: 0 means 01jan1960 00:00:00. Here, however, 1 means
01jan1960 00:00:00.001, 1000 means 01jan1960 00:00:01.000, and 02jan1960
08:00:00 is 115,200,000. The underlying values are big—so it is
important that you store them as
doubles—but the
%tc and
%tC formats make the values readable, just as the
%td format makes daily
(%td) values readable.
There are many new functions to go along with this new value type. clock(), for instance, converts strings such as
“02jan1960 08:00:00” (or even “8:00 a.m.,
1/2/1960”) to their numeric equivalents.
dofc() converts a %tc
value (such as 115,200,000, meaning 02jan1960 08:00:00) to its
%td equivalent (namely, 1, meaning 02jan1960).
cofd() does the reverse (the result would be
86,400,000, meaning 02jan1960 00:00:00).
See [D]
dates and times.
-
The previously existing date() function, which
converts strings to %td values, is now smarter.
In addition to being able to convert strings such as
“21aug2005”, “August 21, 2005”, it can convert
“082105”, “08212005”, “210805”, and
“21082005”. See [D]
dates
and times.
-
New command datasignature allows you to sign
datasets and later use that signature to determine whether the data have
changed. An early version of the command was made available during the
Stata 9 release. That command is now called
_datasignature and was used as the building block
for the new, improved datasignature. See [D]
datasignature
and [P]
_datasignature.
-
Existing command clear now clears data and value
labels only. Type
“clear all” to clear everything.
This change will bite you the first few times you type
“clear” expecting it to
“clear all”. The problem was that new
users were surprised when
“clear” by itself cleared everything, whereas
“use filename clear”
loaded new data and value labels but left everything else in place. The
new users were right.
clear now has the following subcommands:
-
clear all clears everything from memory.
-
clear ado clears automatically loaded
ado-file programs.
-
clear programs clears all programs,
automatically loaded or not.
-
clear results clears saved results.
-
clear mata clears Mata functions and objects
from memory.
See [D]
clear.
-
Stata for Unix now supports unixODBC [sic], making it easier to
connect to databases such as Oracle, MySQL, and PostgreSQL; see [D]
odbc.
-
Existing command describe
now allows option
varlist that was previously allowed only by
describe using.
Existing command describe using varlist
now allows option simple that was previously only
allowed by describe. Option
varlist saves the variable names in
r(varlist) and optoin simple
displays the variable name in a compact form. See [D]
describe.
-
Existing command
collapse now supports four additional
stats:
first, the first value;
last, the last value;
firstnm, the first nonmissing value; and
lastnm, the last nonmissing value. See [D]
collapse.
-
Existing command cf (compare files) now provides
a detailed listing of observations that differ when the
verbose option is specified. Setting version to less
than 10.0 restores the earlier behavior. See [D]
cf.
-
Existing command codebook has new option
compact that produces more compact output. See [D]
codebook.
-
Existing command insheet has new option
case that preserves the case of variable names
when importing data; see [D]
insheet.
-
Existing command outsheet has new option
delimiter() that specifies an alternative delimiter; see [D]
outsheet.
-
Existing commands infile and
infix can now read up to 524,275
characters per line; the previous limit was 32,765. See [D]
infile and [D]
infix (fixed format).
-
Existing commands icd9 and
icd9p have now been updated to use the V24 codes; see [D]
icd9.
-
New function itrim() returns the string with
consecutive, internal spaces collapsed to one space; see
String functions in [D]
functions.
-
New functions lnnormal() and
lnnormalden() provide the natural logarithm of
the cumulative standard normal distribution and of the standard normal
density; see
Probability distributions and density functions in [D]
functions.
-
New functions for calculating cumulative densities are now available:
| binomial(n, k, p) |
lower tail of the binomial distribution |
| ibetatail(a, b, x) |
reverse (upper tail) of the cumulative beta
distribution |
| gammaptail(a, x) |
reverse (upper tail) of the cumulative gamma
distribution |
| invgammaptail(am p) |
inverse reverse of the cumulative gamma
distribution |
| invibetatail(a, b, p) |
inverse reverse of the cumulative beta
distribution |
| invbinomialtail(n, k, p) |
inverse of right cumulative binomial |
See Probability distributions and density functions in [D]
functions.
-
Existing function
Binomial(n, k,
p) has been renamed
binomialtail(n,
k, p), thus
making its name consistent with the naming convention for probability
functions. The accuracy of the function has also been improved for very
large values of n. At the other end of the
number line, the function now returns the appropriate 0 or 1 value when
n = 0, rather than returning missing.
Binomial() continues to work as a synonym for
binomialtail().
-
The behavior and accuracy of the following probability functions have been
improved:
-
F(n1,
n2, f) and
Ftail(n1,
n2, f) are
more accurate for small values of n1 and large
values of n2. Also,
F() is more accurate for large
f where n1 and
n2 are less than 1.
-
gammap(a,
x) is more accurate when
a is large and x is near
a.
-
ibeta(a, b,
x) now is more accurate when
x is near
a/(a + b) and
a or b is large.
-
invbinomial(n,
k, p),
invchi2(n,
p),
invchi2tail(n,
p),
invF(n1,
n2, p), and
invgammap(a,
p) are more accurate for small values of
p or for returned values close to zero.
-
invFtail(n1,
n2,
p) and
invibeta(a,
b, p)
are more accurate for small values of
p or for returned values close to zero.
-
invttail(n,
p) is more accurate for small values
of p or for returned values close to zero.
-
ttail(n,
t) is more accurate for exceedingly
large values of n.
-
Existing function invbinomial(n,
k,
p) now returns the probability of a
success on one trial such that the probability of observing
k or fewer successes in
n trials is
p. The previous behavior of
invbinomial() is restored under version control.
-
New function fmtwidth() returns the display width of a
%fmt string; see
Programming functions in [D]
functions.
-
The maximum length of a %fmt
has increased from 12 to 48 characters; see [D]
format.
(This change was necessitated by the new date/time variables).
-
Existing commands
corr2data and
drawnorm now allow singular correlation (or
covariance) structures. New option forcepsd
modifies a matrix to be positive semidefinite and thus to be a proper
covariance matrix. See [D]
corr2data
and [D] drawnorm.
-
Existing command
hexdump analyze now saves the number of
\r\n characters in
r(Windows) rather than in
r(DOS). r(DOS) is still
set when version is less than 10. See [D]
hexdump.
Back to highlights
|
|