The National Institute of Standards and Technology (NIST) writes,
In response to industrial concerns about the numerical accuracy of computations from statistical software, the Statistical Engineering and Mathematical and Computational Sciences Divisions of NIST’s Information Technology Laboratory are providing datasets with certified values for a variety of statistical methods.
These datasets are known as the NIST StRD—Standard Reference Data. See the NIST StRD web page for detailed descriptions of these datasets and tests.
Below are presented the results of running these tests on Stata.
In reporting comparisons, it is popular to report the LRE—the log relative error. Let c represent a calculated result and t the answer supplied by NIST. The formal definition of this comparison is
The result of this calculation is then called “Digits of Accuracy” or, more precisely, “Decimal Digits of Accuracy”; it counts the number of digits in common with the true value (higher values are obviously better). Note that LRE cannot exceed 15.
Results were obtained on August 23, 2018, running Stata/SE 15 for Linux (console version). The computer ran the CentOS 6.6 Linux distribution on an eight-core Intel i7 microprocessor. Results will differ slightly on other platforms because of compiler and hardware differences; Stata runs the same numerical code on all platforms. See results obtained using Stata 14 on April 1, 2015.
Detailed results for each of the tests are provided below.
Stata
Digits of accuracy
-----------------------
lag-1
Test Difficulty mean S.D. autocorr.
--------------------------------------------------------
PiDigits lower 15.0 15.0 14.9 log do-file
Lottery lower 15.0 15.0 15.0 log do-file
Lew lower 15.0 15.0 14.8 log do-file
Mavro lower 15.0 13.1 13.7 log do-file
Michelson lower 15.0 13.8 13.4 log do-file
NumAcc-1 lower 15.0 15.0 15.0 log do-file
NumAcc-2 average 15.0 15.0 15.0 log do-file
NumAcc-3 average 15.0 9.5 11.9 log do-file
NumAcc-4 higher 15.0 8.3 10.7 log do-file
--------------------------------------------------------
Average 15.0 13.3 13.8
Minimum 15.0 8.3 10.7
Maximum 15.0 15.0 15.0
Stata
Digits of accuracy
-----------------------
Test Difficulty Ceof. S.E. RSS
-------------------------------------------------------
Norris lower 12.8 13.5 13.3 log do-file
Pontius lower 11.5 13.0 12.7 log do-file
NoInt-1 average 14.7 15.0 14.9 log do-file
NoInt-2 average 15.0 15.0 14.7 log do-file
Filippelli higher no full solution(*) log do-file
Longley higher 12.1 12.9 13.2 log do-file
Wampler-1 higher 6.9 15.0 15.0 log do-file
Wampler-2 higher 10.4 15.0 15.0 log do-file
Wampler-3 higher 6.5 10.8 14.1 log do-file
Wampler-4 higher 6.5 10.8 15.0 log do-file
Wampler-5 higher 6.4 10.8 15.0 log do-file
-----------------------------------------------------------
Average 10.3 13.2 14.3
Minimum 6.4 10.8 12.7
Maximum 15.0 15.0 15.0
-----------------------------------------------------------
Each test involved multiple independent variables. Reported under Coef. and S.E. is the minimum LRE for all regressors, including the intercept, if any. RSS reports the LRE for the residual (error) sums of squares.
(*) Filippelli test: Stata found the variables so collinear that it dropped two of them—that is, it set two coefficients and standard errors to zero. The resulting estimates still fit the data well. Most other statistical software packages have done the same thing, and most authors have interpreted this result as acceptable for this test. Stata has an orthpoly command that can do this problem, but it would not occur to most users to use it, and transforming results back to the metric of the problem requires an extra statement. However, if that command is used, the LRE for the coefficients is 8.4 and the LRE for the RSS is 8.5.
Stata
Digits of accuracy
------------------
Test Difficulty F
----------------------------------------------------
Si Resistivity lower 13.1 log do-file
Simon-Lesage 1 lower 15.0 log do-file
Simon-Lesage 2 lower 13.6 log do-file
Simon-Lesage 3 lower 12.8 log do-file
Ag Atomic Wt average 10.2 log do-file
Simon-Lesage 4 average 10.4 log do-file
Simon-Lesage 5 average 10.2 log do-file
Simon-Lesage 6 average 10.2 log do-file
Simon-Lesage 7 higher 4.4(*) log do-file
7b higher 15.0(*) log do-file
Simon-Lesage 8 higher 4.3(*) log do-file
8b higher 15.0(*) log do-file
Simon-Lesage 9 higher 4.2(*) log do-file
9b higher 15.0(*) log do-file
----------------------------------------------------
Average excluding S-L 7, 8, 9 12.8
Minimum 10.2
Maximum 15.0
----------------------------------------------------
(*) Tests Simon–Lesage 7b through 9b are a variation developed by Stata on tests Simon–Lesage 7 through 9. To our knowledge, no package that stores and processes data in binary double precision has ever done better than 4.6 on these tests, and that is because it is not possible to do better; the problem is with the test, not the packages being tested. The difficulty is that that data are made different from what the authors intended the instant they are stored on a double-precision binary computer. The test uses y values, such as 1,000,000,000,000.4, but that value immediately becomes 1,000,000,000,000.40002441... because of how computers store numbers. We strongly suspect that the answer Stata produces, and the answers produced by other packages, are correct given the data stored.
Tests Simon–Lesage 7b through 9b are modifications of Simon–Lesage 7 through 9, the difference being that the data are multiplied by 10 before being input, so 1,000,000,000,000.4 becomes 10,000,000,000,004, a number that can be stored with perfect accuracy. The test is then carried through, the question being whether the ANOVA routine can deal with data that varies only in the trailing digits.
Stata
Digits of accuracy
----------------------
Test Difficulty Coef. S.E. RSS
----------------------------------------------------
Misra 1a lower 9.4 6.4 10.5 log do-file
Chwirut 2 lower 8.0 6.3 11.2 log do-file
Chwirut 1 lower 7.6 6.3 11.4 log do-file
Lanczos 3 lower 7.2 6.0 10.6 log do-file
Gauss 1 lower 8.5 6.3 11.6 log do-file
Gauss 2 lower 8.2 5.9 10.6 log do-file
Daniel Wood lower 8.6 6.2 11.7 log do-file
Misra 1b lower 9.9 6.5 11.3 log do-file
Kirby 2 average 8.0 6.3 11.6 log do-file
Hahn 1 average 7.1 5.1 10.6 log do-file
Nelson average 7.1 5.2 10.9 log do-file
MGH 17 average (7.0) (6.1) (11.5) log do-file
Lanczos 1 average 10.6 3.3 3.0 log do-file
Lanczos 2 average 7.9 5.4 10.1 log do-file
Gauss 3 average 8.2 5.5 11.0 log do-file
Misra 1c average 9.7 6.5 11.1 log do-file
Misra 1d average 9.3 6.5 11.2 log do-file
Roszman 1 average 7.4 6.4 12.2 log do-file
ENSO average 4.7 5.3 11.3 log do-file
MGH 09 higher (7.0) (6.5) (11.6) log do-file
Thurber higher 6.5 5.4 11.3 log do-file
BoxBOD higher 7.3 6.7 10.4 log do-file
Ratkowsky 2 higher 7.6 6.0 11.8 log do-file
MGH 10 higher (7.7) (4.7) (11.4) log do-file
Eckerle4 higher (8.3) (6.4) (10.7) log do-file
Ratkowsky 3 higher (6.0) (5.0) (11.4) log do-file
Bennett 5 higher 6.4 5.9 11.0 log do-file
----------------------------------------------------
Average 7.8 5.8 10.9
Minimum 4.7 3.3 3.0
Maximum 10.6 6.7 12.2
----------------------------------------------------
Parentheses indicate that convergence could not be achieved with the first set of starting values and that the second set had to be used.
Each test involved multiple independent variables. Reported under Coef. and S.E. is the minimum LRE for all regressors, including the intercept, if any. RSS reports the LRE for the residual (error) sums of squares.
See results obtained using Stata 14 on April 1, 2015.
See results obtained using Stata 13 on June 17, 2013.
See results obtained using Stata 11 on September 30, 2009.