The National Institute of Standards and Technology (NIST) writes,
In response to industrial concerns about the numerical accuracy of computations from statistical software, the Statistical Engineering and Mathematical and Computational Sciences Divisions of NIST’s Information Technology Laboratory are providing datasets with certified values for a variety of statistical methods.
These datasets are known as the NIST StRD—Standard Reference Data. See the NIST StRD web page for detailed descriptions of these datasets and tests.
Below are presented the results of running these tests on Stata.
In reporting comparisons, it is popular to report the LRE—the log relative error. Let c represent a calculated result and t the answer supplied by NIST. The formal definition of this comparison is
The result of this calculation is then called “Digits of Accuracy” or, more precisely, “Decimal Digits of Accuracy”; it counts the number of digits in common with the true value (higher values are obviously better). Note that LRE cannot exceed 15.
Results were obtained on May 19, 2019, running Stata/SE 16 for Linux (console version). The computer ran the CentOS 7 Linux distribution on an eight-core Intel i7 microprocessor. Results will differ slightly on other platforms because of compiler and hardware differences; Stata runs the same numerical code on all platforms. See results obtained using Stata 15 on August 23, 2018.
Detailed results for each of the tests are provided below.
Stata Digits of accuracy ----------------------- lag-1 Test Difficulty mean S.D. autocorr. -------------------------------------------------------- PiDigits lower 15.0 15.0 14.9 log do-file Lottery lower 15.0 15.0 15.0 log do-file Lew lower 15.0 15.0 14.8 log do-file Mavro lower 15.0 13.1 13.7 log do-file Michelson lower 15.0 13.8 13.4 log do-file NumAcc-1 lower 15.0 15.0 15.0 log do-file NumAcc-2 average 15.0 15.0 15.0 log do-file NumAcc-3 average 15.0 9.5 11.9 log do-file NumAcc-4 higher 15.0 8.3 10.7 log do-file -------------------------------------------------------- Average 15.0 13.3 13.8 Minimum 15.0 8.3 10.7 Maximum 15.0 15.0 15.0
Stata Digits of accuracy ----------------------- Test Difficulty Ceof. S.E. RSS ------------------------------------------------------- Norris lower 12.8 13.5 13.3 log do-file Pontius lower 11.5 13.0 12.7 log do-file NoInt-1 average 14.7 15.0 14.9 log do-file NoInt-2 average 15.0 15.0 14.7 log do-file Filippelli higher no full solution(*) log do-file Longley higher 12.1 12.9 13.2 log do-file Wampler-1 higher 6.9 15.0 15.0 log do-file Wampler-2 higher 10.4 15.0 15.0 log do-file Wampler-3 higher 6.5 10.8 14.1 log do-file Wampler-4 higher 6.5 10.8 15.0 log do-file Wampler-5 higher 6.4 10.8 15.0 log do-file ----------------------------------------------------------- Average 10.3 13.2 14.3 Minimum 6.4 10.8 12.7 Maximum 15.0 15.0 15.0 -----------------------------------------------------------
Each test involved multiple independent variables. Reported under Coef. and S.E. is the minimum LRE for all regressors, including the intercept, if any. RSS reports the LRE for the residual (error) sums of squares.
(*) Filippelli test: Stata found the variables so collinear that it dropped two of them—that is, it set two coefficients and standard errors to zero. The resulting estimates still fit the data well. Most other statistical software packages have done the same thing, and most authors have interpreted this result as acceptable for this test. Stata has an orthpoly command that can do this problem, but it would not occur to most users to use it, and transforming results back to the metric of the problem requires an extra statement. However, if that command is used, the LRE for the coefficients is 8.4 and the LRE for the RSS is 8.5.
Stata Digits of accuracy ------------------ Test Difficulty F ---------------------------------------------------- Si Resistivity lower 13.1 log do-file Simon-Lesage 1 lower 15.0 log do-file Simon-Lesage 2 lower 13.6 log do-file Simon-Lesage 3 lower 12.8 log do-file Ag Atomic Wt average 10.2 log do-file Simon-Lesage 4 average 10.4 log do-file Simon-Lesage 5 average 10.2 log do-file Simon-Lesage 6 average 10.2 log do-file Simon-Lesage 7 higher 4.4(*) log do-file 7b higher 15.0(*) log do-file Simon-Lesage 8 higher 4.3(*) log do-file 8b higher 15.0(*) log do-file Simon-Lesage 9 higher 4.2(*) log do-file 9b higher 15.0(*) log do-file ---------------------------------------------------- Average excluding S-L 7, 8, 9 12.8 Minimum 10.2 Maximum 15.0 ----------------------------------------------------
(*) Tests Simon–Lesage 7b through 9b are a variation developed by Stata on tests Simon–Lesage 7 through 9. To our knowledge, no package that stores and processes data in binary double precision has ever done better than 4.6 on these tests, and that is because it is not possible to do better; the problem is with the test, not the packages being tested. The difficulty is that that data are made different from what the authors intended the instant they are stored on a double-precision binary computer. The test uses y values, such as 1,000,000,000,000.4, but that value immediately becomes 1,000,000,000,000.40002441... because of how computers store numbers. We strongly suspect that the answer Stata produces, and the answers produced by other packages, are correct given the data stored.
Tests Simon–Lesage 7b through 9b are modifications of Simon–Lesage 7 through 9, the difference being that the data are multiplied by 10 before being input, so 1,000,000,000,000.4 becomes 10,000,000,000,004, a number that can be stored with perfect accuracy. The test is then carried through, the question being whether the ANOVA routine can deal with data that varies only in the trailing digits.
Stata Digits of accuracy ---------------------- Test Difficulty Coef. S.E. RSS ---------------------------------------------------- Misra 1a lower 9.4 6.4 10.5 log do-file Chwirut 2 lower 8.0 6.3 11.2 log do-file Chwirut 1 lower 7.6 6.3 11.4 log do-file Lanczos 3 lower 7.2 6.0 10.6 log do-file Gauss 1 lower 8.5 6.3 11.6 log do-file Gauss 2 lower 8.2 5.9 10.6 log do-file Daniel Wood lower 8.6 6.2 11.7 log do-file Misra 1b lower 9.9 6.5 11.3 log do-file Kirby 2 average 8.0 6.3 11.6 log do-file Hahn 1 average 7.1 5.1 10.6 log do-file Nelson average 7.1 5.2 10.9 log do-file MGH 17 average (7.0) (6.1) (11.5) log do-file Lanczos 1 average 10.6 3.3 3.0 log do-file Lanczos 2 average 7.9 5.4 10.1 log do-file Gauss 3 average 8.2 5.5 11.0 log do-file Misra 1c average 9.7 6.5 11.1 log do-file Misra 1d average 9.3 6.5 11.2 log do-file Roszman 1 average 7.4 6.4 12.2 log do-file ENSO average 4.7 5.3 11.3 log do-file MGH 09 higher (7.0) (6.5) (11.6) log do-file Thurber higher 6.5 5.4 11.3 log do-file BoxBOD higher 7.3 6.7 10.4 log do-file Ratkowsky 2 higher 7.6 6.0 11.8 log do-file MGH 10 higher (7.7) (4.7) (11.4) log do-file Eckerle4 higher (8.3) (6.4) (10.7) log do-file Ratkowsky 3 higher (6.0) (5.0) (11.4) log do-file Bennett 5 higher 6.4 5.9 11.0 log do-file ---------------------------------------------------- Average 7.8 5.8 10.9 Minimum 4.7 3.3 3.0 Maximum 10.6 6.7 12.2 ----------------------------------------------------
Parentheses indicate that convergence could not be achieved with the first set of starting values and that the second set had to be used.
Each test involved multiple independent variables. Reported under Coef. and S.E. is the minimum LRE for all regressors, including the intercept, if any. RSS reports the LRE for the residual (error) sums of squares.
See results obtained using Stata 15 on August 23, 2018.
See results obtained using Stata 14 on April 1, 2015.
See results obtained using Stata 13 on June 17, 2013.
See results obtained using Stata 11 on September 30, 2009.