Search
>> Home >> Resources & support >> Certification results

1. Explanation

The National Institute of Standards and Technology (NIST) writes,

In response to industrial concerns about the numerical accuracy of computations from statistical software, the Statistical Engineering and Mathematical and Computational Sciences Divisions of NIST’s Information Technology Laboratory are providing datasets with certified values for a variety of statistical methods.

These datasets are known as the NIST StRD—Standard Reference Data. See the NIST StRD web page for detailed descriptions of these datasets and tests.

Below are presented the results of running these tests on Stata.

In reporting comparisons, it is popular to report the LRE—the log relative error. Let c represent a calculated result and t the answer supplied by NIST. The formal definition of this comparison is

• LRE = min( 15, -log10(|c-t|/t) ) if |t|!=0
• LRE = min( 15, -log10(|c-t|) ) otherwise.

The result of this calculation is then called “Digits of Accuracy” or, more precisely, “Decimal Digits of Accuracy”; it counts the number of digits in common with the true value (higher values are obviously better). Note that LRE cannot exceed 15.

2. Summary of results

Results were obtained September 30, 2009, running Stata/IC 11 for Linux (console version) based on the executable file released to users on August 26, 2009 and the ado-file update released to users on September 14, 2009. The computer ran the Fedora Core 8 Linux distribution on a dual-core AMD Opteron microprocessor. Results will differ slightly on other platforms because of compiler and hardware differences; Stata runs the same numerical code on all platforms.

Univariate summary statistics:
Stata completed all tests. Means were estimated with never less than 15 digits of accuracy. Standard deviations averaged 13.3 correct digits, ranging from 8.3 to 15 digits. The lag-1 autocorrelation averaged 13.8 correct digits, ranging from 10.7 to 15 digits.
Linear regression:
Stata completed all tests except one, the Filippelli test.

For the other tests, coefficients averaged 10.3 correct digits and never had fewer than 6.4 correct digits. Standard errors averaged 13.2 correct digits (minimum 10.8), and residuals sums of squares averaged 14.3 correct digits (minimum 12.7).

In the Filippelli test, Stata found two coefficients so collinear that it dropped them from the analysis. Most other statistical software packages have done the same thing, and most authors have interpreted this result as acceptable for this test.
Analysis of variance:
Stata completed all tests. The F statistic averaged 12.8 correct digits and never had fewer than 10.2 correct digits.

The above results include a correction made by us to three of the tests. An error in the construction of these three tests makes ANOVA routines implemented in binary double precision appear less precise than they are. The data, as originally presented, are accurate to only a few digits with the result that F statistics can be calculated only to a few digits. The correction is described below.
Nonlinear regression:
Stata completed all tests. Coefficients averaged 7.8 correct digits and never had fewer than 4.7 correct digits. Standard errors averaged 5.8 correct digits and never had fewer than 3.3 correct digits. Residual sums of squares averaged 10.9 correct digits and never had fewer than 3.0 correct digits.

Detailed results for each of the tests are provided below.

3. Certification results: univariate summary statistics

                                                 Stata
Digits of accuracy
-----------------------
lag-1
Test           Difficulty       mean    S.D.   autocorr.
--------------------------------------------------------
PiDigits       lower            15.0    15.0       14.9    log   do-file
Lottery        lower            15.0    15.0       15.0    log   do-file
Lew            lower            15.0    15.0       14.8    log   do-file
Mavro          lower            15.0    13.1       13.7    log   do-file
Michelson      lower            15.0    13.8       13.4    log   do-file
NumAcc-1       lower            15.0    15.0       15.0    log   do-file
NumAcc-2       average          15.0    15.0       15.0    log   do-file
NumAcc-3       average          15.0     9.5       11.9    log   do-file
NumAcc-4       higher           15.0     8.3       10.7    log   do-file
--------------------------------------------------------
Average                         15.0    13.3       13.8
Minimum                         15.0     8.3       10.7
Maximum                         15.0    15.0       15.0


4. Certification results: linear regression

                                                 Stata
Digits of accuracy
-----------------------
-------------------------------------------------------
Norris         lower            12.8    13.5       13.3    log   do-file
Pontius        lower            11.5    13.0       12.7    log   do-file
NoInt-1        average          14.7    15.0       14.9    log   do-file
NoInt-2        average          15.0    15.0       14.7    log   do-file
Filippelli     higher             no full solution(*)      log   do-file
Longley        higher           12.1    12.9       13.2    log   do-file
Wampler-1      higher            6.9    15.0       15.0    log   do-file
Wampler-2      higher           10.4    15.0       15.0    log   do-file
Wampler-3      higher            6.5    10.8       14.1    log   do-file
Wampler-4      higher            6.5    10.8       15.0    log   do-file
Wampler-5      higher            6.4    10.8       15.0    log   do-file
-----------------------------------------------------------
Average                         10.3    13.2       14.3
Minimum                          6.4    10.8       12.7
Maximum                         15.0    15.0       15.0
-----------------------------------------------------------


Each test involved multiple independent variables. Reported under Coef. and S.E. is the minimum LRE for all regressors, including the intercept, if any. RSS reports the LRE for the residual (error) sums of squares.

(*) Filippelli test: Stata found the variables so collinear that it dropped two of them—that is, it set two coefficients and standard errors to zero. The resulting estimates still fit the data well. Most other statistical software packages have done the same thing, and most authors have interpreted this result as acceptable for this test. Stata has an orthpoly command that can do this problem, but it would not occur to most users to use it, and transforming results back to the metric of the problem requires an extra statement. However, if that command is used, the LRE for the coefficients is 8.4 and the LRE for the RSS is 8.5.

5. Certification results: analysis of variance

                                                Stata
Digits of accuracy
------------------
Test               Difficulty             F
----------------------------------------------------
Si Resistivity     lower                 13.1          log   do-file
Simon-Lesage 1     lower                 15.0          log   do-file
Simon-Lesage 2     lower                 13.6          log   do-file
Simon-Lesage 3     lower                 12.8          log   do-file
Ag Atomic Wt       average               10.2          log   do-file
Simon-Lesage 4     average               10.4          log   do-file
Simon-Lesage 5     average               10.2          log   do-file
Simon-Lesage 6     average               10.2          log   do-file
Simon-Lesage 7     higher                 4.4(*)       log   do-file
7b    higher                15.0(*)       log   do-file
Simon-Lesage 8     higher                 4.2(*)       log   do-file
8b    higher                15.0(*)       log   do-file
Simon-Lesage 9     higher                 4.2(*)       log   do-file
9b    higher                15.0(*)       log   do-file
----------------------------------------------------
Average excluding S-L 7, 8, 9            12.8
Minimum                                  10.2
Maximum                                  15.0
----------------------------------------------------


(*) Tests Simon–Lesage 7b through 9b are a variation developed by Stata on tests Simon–Lesage 7 through 9. To our knowledge, no package that stores and processes data in binary double precision has ever done better than 4.6 on these tests, and that is because it is not possible to do better; the problem is with the test, not the packages being tested. The difficulty is that that data are made different from what the authors intended the instant they are stored on a double-precision binary computer. The test uses y values, such as 1,000,000,000,000.4, but that value immediately becomes 1,000,000,000,000.40002441... because of how computers store numbers. We strongly suspect that the answer Stata produces, and the answers produced by other packages, are correct given the data stored.

Tests Simon–Lesage 7b through 9b are modifications of Simon–Lesage 7 through 9, the difference being that the data are multiplied by 10 before being input, so 1,000,000,000,000.4 becomes 10,000,000,000,004, a number that can be stored with perfect accuracy. The test is then carried through, the question being whether the ANOVA routine can deal with data that varies only in the trailing digits.

6. Certification results: nonlinear regression

                                              Stata
Digits of accuracy
----------------------
----------------------------------------------------
Misra 1a       lower            9.4     6.4     10.5     log   do-file
Chwirut 2      lower            8.0     6.3     11.2     log   do-file
Chwirut 1      lower            7.6     6.3     11.4     log   do-file
Lanczos 3      lower            7.2     6.0     10.6     log   do-file
Gauss 1        lower            8.5     6.3     11.6     log   do-file
Gauss 2        lower            8.2     5.9     10.6     log   do-file
Daniel Wood    lower            8.6     6.2     11.7     log   do-file
Misra 1b       lower            9.9     6.5     11.3     log   do-file
Kirby 2        average          8.0     6.3     11.6     log   do-file
Hahn 1         average          7.1     5.1     10.6     log   do-file
Nelson         average          7.1     5.2     10.9     log   do-file
MGH 17         average         (7.0)   (6.1)   (11.5)    log   do-file
Lanczos 1      average         10.6     3.3      3.0     log   do-file
Lanczos 2      average          7.9     5.4     10.1     log   do-file
Gauss 3        average          8.2     5.5     11.0     log   do-file
Misra 1c       average          9.7     6.5     11.1     log   do-file
Misra 1d       average          9.3     6.5     11.2     log   do-file
Roszman 1      average          7.4     6.4     12.2     log   do-file
ENSO           average          4.7     5.3     11.3     log   do-file
MGH 09         higher          (7.0)   (6.5)   (11.6)    log   do-file
Thurber        higher           6.5     5.4     11.3     log   do-file
BoxBOD         higher           7.3     6.7     10.4     log   do-file
Ratkowsky 2    higher           7.6     6.0     11.8     log   do-file
MGH 10         higher          (7.7)   (4.7)   (11.4)    log   do-file
Eckerle4       higher          (8.3)   (6.4)   (10.7)    log   do-file
Ratkowsky 3    higher          (6.0)   (5.0)   (11.4)    log   do-file
Bennett 5      higher           6.4     5.9     11.0     log   do-file
----------------------------------------------------
Average                         7.8     5.8     10.9
Minimum                         4.7     3.3      3.0
Maximum                        10.6     6.7     12.2
----------------------------------------------------


Parentheses indicate that convergence could not be achieved with the first set of starting values and that the second set had to be used.

Each test involved multiple independent variables. Reported under Coef. and S.E. is the minimum LRE for all regressors, including the intercept, if any. RSS reports the LRE for the residual (error) sums of squares.