NIST StRD certification results using Stata 10
- Explanation
- Summary of results
- Certification results: univariate
summary statistics
- Certification results: linear
regression
- Certification results: analysis of
variance
- Certification results: nonlinear
regression
1. Explanation
The National Institute of Standards and Technology (NIST) writes,
In response to industrial concerns about the numerical accuracy of
computations from statistical software, the Statistical Engineering and
Mathematical and Computational Sciences Divisions of NIST’s
Information Technology Laboratory are providing datasets with certified
values for a variety of statistical methods.
These datasets are known as the NIST StRD—Standard Reference Data.
See the
NIST StRD web page for detailed descriptions of these datasets and
tests.
Below are presented the results of running these tests on Stata.
In reporting comparisons, it is popular to report the LRE—the log
relative error. Let c represent a calculated result and t the
answer supplied by NIST. The formal definition of this comparison is
- LRE = min( 15, -log10(|c-t|/t) ) if |t|!=0
- LRE = min( 15, -log10(|c-t|) ) otherwise.
The result of this calculation is then called “Digits of
Accuracy” or, more precisely, “Decimal Digits of
Accuracy”; it counts the number of digits in common with the true
value (higher values are obviously better). Note that LRE cannot exceed 15.
2. Summary of results
Results were obtained July 9, 2007, running Stata 10 for Linux (console
version) on a computer with an AMD Opteron processor and the Fedora Core 2
Linux operating system. Results will differ slightly on other platforms
because of compiler and hardware differences; Stata runs the same numerical
code on all platforms.
- Univariate summary statistics:
- Stata completed all tests. Means were estimated with never less than 15
digits of accuracy. Standard deviations averaged 13.3 correct digits,
ranging from 8.3 to 15 digits. The lag-1 autocorrelation averaged 13.8
correct digits, ranging from 10.7 to 15 digits.
- Linear regression:
- Stata completed all tests except one, the Filippelli test.
For the other tests, coefficients averaged 10.2 correct digits and never
had fewer than 6.4 correct digits. Standard errors averaged 13.2 correct
digits (minimum 10.8), and residuals sums of squares averaged 14.3 correct
digits (minimum 12.7).
In the Filippelli test, Stata found two coefficients so collinear that it
dropped them from the analysis. Most other statistical software packages
have done the same thing, and most authors have interpreted this result as
acceptable for this test.
- Analysis of variance:
- Stata completed all tests. The F statistic averaged 12.8 correct digits
and never had fewer than 10.2 correct digits.
The above results include a correction made by us to three of the tests.
An error in the construction of these three tests makes ANOVA routines
implemented in binary double precision appear less precise than they are.
The data, as originally presented, are accurate to only a few digits with
the result that F statistics can be calculated only to a few digits. The
correction is described below.
- Nonlinear regression:
- Stata completed all tests. Coefficients averaged 7.8 correct digits and
never had fewer than 4.7 correct digits. Standard errors averaged 5.8
correct digits and never had fewer than 3.3 correct digits. Residual sums
of squares averaged 10.9 correct digits and never had fewer than 3.0
correct digits.
Detailed results for each of the tests are provided below.
3. Certification results: univariate summary statistics
Stata
Digits of accuracy
-----------------------
lag-1
Test Difficulty mean S.D. autocorr.
--------------------------------------------------------
PiDigits lower 15.0 15.0 14.9 log do-file
Lottery lower 15.0 15.0 15.0 log do-file
Lew lower 15.0 15.0 14.8 log do-file
Mavro lower 15.0 13.1 13.7 log do-file
Michelson lower 15.0 13.8 13.4 log do-file
NumAcc-1 lower 15.0 15.0 15.0 log do-file
NumAcc-2 average 15.0 15.0 15.0 log do-file
NumAcc-3 average 15.0 9.5 11.9 log do-file
NumAcc-4 higher 15.0 8.3 10.7 log do-file
--------------------------------------------------------
Average 15.0 13.3 13.8
Minimum 15.0 8.3 10.7
Maximum 15.0 15.0 15.0
4. Certification results: linear regression
Stata
Digits of accuracy
-----------------------
Test Difficulty Ceof. S.E. RSS
-------------------------------------------------------
Norris lower 12.8 13.5 13.3 log do-file
Pontius lower 11.5 13.0 12.7 log do-file
NoInt-1 average 14.7 15.0 14.9 log do-file
NoInt-2 average 15.0 15.0 14.7 log do-file
Filippelli higher no full solution(*) log do-file
Longley higher 12.1 12.9 13.2 log do-file
Wampler-1 higher 6.9 15.0 15.0 log do-file
Wampler-2 higher 9.7 15.0 15.0 log do-file
Wampler-3 higher 6.5 10.8 14.1 log do-file
Wampler-4 higher 6.5 10.8 15.0 log do-file
Wampler-5 higher 6.4 10.8 15.0 log do-file
-----------------------------------------------------------
Average 10.2 13.2 14.3
Minimum 6.4 10.8 12.7
Maximum 15.0 15.0 15.0
-----------------------------------------------------------
Each test involved multiple independent variables. Reported under Coef. and
S.E. is the minimum LRE for all regressors, including the intercept, if any.
RSS reports the LRE for the residual (error) sums of squares.
(*) Filippelli test: Stata found the variables so collinear that it dropped
two of them—that is, it set two coefficients and standard errors to
zero. The resulting estimates still fit the data well. Most other
statistical software packages have done the same thing, and most authors
have interpreted this result as acceptable for this test. Stata has an
orthpoly command that can do this problem, but it would not occur to
most users to use it, and transforming results back to the metric of the
problem requires an extra statement. However, if that command is used, the
LRE for the coefficients is 8.4 and the LRE for the RSS is 8.5.
5. Certification results: analysis of variance
Stata
Digits of accuracy
------------------
Test Difficulty F
----------------------------------------------------
Si Resistivity lower 13.1 log do-file
Simon-Lesage 1 lower 14.9 log do-file
Simon-Lesage 2 lower 13.7 log do-file
Simon-Lesage 3 lower 13.1 log do-file
Ag Atomic Wt average 10.2 log do-file
Simon-Lesage 4 average 10.4 log do-file
Simon-Lesage 5 average 10.2 log do-file
Simon-Lesage 6 average 10.2 log do-file
Simon-Lesage 7 higher 4.4(*) log do-file
7b higher 15.0(*) log do-file
Simon-Lesage 8 higher 4.2(*) log do-file
8b higher 15.0(*) log do-file
Simon-Lesage 9 higher 4.2(*) log do-file
9b higher 15.0(*) log do-file
----------------------------------------------------
Average excluding S-L 7, 8, 9 12.8
Minimum 10.2
Maximum 15.0
----------------------------------------------------
(*)
Tests Simon–Lesage 7b through 9b are a variation developed by
Stata on tests Simon–Lesage 7 through 9. To our knowledge, no package
that stores and processes data in binary double precision has ever done
better than 4.6 on these tests, and that is because it is not possible to do
better; the problem is with the test, not the packages being tested. The
difficulty is that that data are made different from what the authors
intended the instant they are stored on a double-precision binary computer.
The test uses y values, such as 1,000,000,000,000.4, but that value
immediately becomes 1,000,000,000,000.40002441... because of how computers
store numbers. We strongly suspect that the answer Stata produces, and the
answers produced by other packages, are correct given the data stored.
Tests Simon–Lesage 7b through 9b are modifications of
Simon–Lesage 7 through 9, the difference being that the data are
multiplied by 10 before being input, so 1,000,000,000,000.4 becomes
10,000,000,000,004, a number that can be stored with perfect accuracy. The
test is then carried through, the question being whether the ANOVA routine
can deal with data that varies only in the trailing digits.
6. Certification results: nonlinear regression
Stata
Digits of accuracy
----------------------
Test Difficulty Coef. S.E. RSS
----------------------------------------------------
Misra 1a lower 9.4 6.4 10.5 log do-file
Chwirut 2 lower 8.0 6.3 11.2 log do-file
Chwirut 1 lower 7.6 6.3 11.4 log do-file
Lanczos 3 lower 7.2 6.0 10.6 log do-file
Gauss 1 lower 8.5 6.3 11.6 log do-file
Gauss 2 lower 8.2 5.9 10.6 log do-file
Daniel Wood lower 8.6 6.2 11.7 log do-file
Misra 1b lower 9.9 6.5 11.3 log do-file
Kirby 2 average 8.0 6.3 11.6 log do-file
Hahn 1 average 7.1 5.1 10.6 log do-file
Nelson average 7.1 5.2 10.9 log do-file
MGH 17 average (7.0) (6.1) (11.5) log do-file
Lanczos 1 average 10.6 3.3 3.0 log do-file
Lanczos 2 average 7.9 5.4 10.1 log do-file
Gauss 3 average 8.2 5.5 11.0 log do-file
Misra 1c average 9.7 6.5 11.1 log do-file
Misra 1d average 9.3 6.5 11.2 log do-file
Roszman 1 average 7.4 6.4 12.2 log do-file
ENSO average 4.7 5.3 11.3 log do-file
MGH 09 higher (7.0) (6.5) (11.6) log do-file
Thurber higher 6.5 5.4 11.3 log do-file
BoxBOD higher 7.3 6.7 10.4 log do-file
Ratkowsky 2 higher 7.6 6.0 11.8 log do-file
MGH 10 higher (7.7) (4.7) (11.4) log do-file
Eckerle4 higher (8.3) (6.4) (10.7) log do-file
Ratkowsky 3 higher (6.0) (5.0) (11.4) log do-file
Bennett 5 higher 6.4 5.9 11.0 log do-file
----------------------------------------------------
Average 7.8 5.8 10.9
Minimum 4.7 3.3 3.0
Maximum 10.6 6.7 12.2
----------------------------------------------------
Parentheses indicate that convergence could not be achieved with the first
set of starting values and that the second set had to be used.
Each test involved multiple independent variables. Reported under Coef. and
S.E. is the minimum LRE for all regressors, including the intercept, if any.
RSS reports the LRE for the residual (error) sums of squares.
For results obtained using Stata 9 on May 2, 2005,
click here.
|