Certification results

Home / Resources & support / Certification results

NIST StRD certification results using Stata 15

Explanation
Summary of results
Certification results: univariate summary statistics
Certification results: linear regression
Certification results: analysis of variance
Certification results: nonlinear regression

1. Explanation

The National Institute of Standards and Technology (NIST) writes,

In response to industrial concerns about the numerical accuracy of computations from statistical software, the Statistical Engineering and Mathematical and Computational Sciences Divisions of NIST’s Information Technology Laboratory are providing datasets with certified values for a variety of statistical methods.

These datasets are known as the NIST StRD—Standard Reference Data. See the NIST StRD web page for detailed descriptions of these datasets and tests.

Below are presented the results of running these tests on Stata.

In reporting comparisons, it is popular to report the LRE—the log relative error. Let c represent a calculated result and t the answer supplied by NIST. The formal definition of this comparison is

LRE = min( 15, -log10(|c-t|/t) ) if |t|!=0
LRE = min( 15, -log10(|c-t|) ) otherwise.

The result of this calculation is then called “Digits of Accuracy” or, more precisely, “Decimal Digits of Accuracy”; it counts the number of digits in common with the true value (higher values are obviously better). Note that LRE cannot exceed 15.

2. Summary of results

Results were obtained on August 23, 2018, running Stata/SE 15 for Linux (console version). The computer ran the CentOS 6.6 Linux distribution on an eight-core Intel i7 microprocessor. Results will differ slightly on other platforms because of compiler and hardware differences; Stata runs the same numerical code on all platforms. See results obtained using Stata 14 on April 1, 2015.

Univariate summary statistics:: Stata completed all tests. Means were estimated with never fewer than 15 digits of accuracy. Standard deviations averaged 13.3 correct digits, ranging from 8.3 to 15 digits. The lag-1 autocorrelation averaged 13.8 correct digits, ranging from 10.7 to 15 digits.

Linear regression:: Stata completed all tests except one, the Filippelli test.

For the other tests, coefficients averaged 10.3 correct digits and never had fewer than 6.4 correct digits. Standard errors averaged 13.2 correct digits (minimum 10.8), and residuals sums of squares averaged 14.3 correct digits (minimum 12.7).

In the Filippelli test, Stata found two coefficients so collinear that it dropped them from the analysis. Most other statistical software packages have done the same thing, and most authors have interpreted this result as acceptable for this test.

Analysis of variance:: Stata completed all tests. The F statistic averaged 12.8 correct digits and never had fewer than 10.2 correct digits.

The above results include a correction made by us to three of the tests. An error in the construction of these three tests makes ANOVA routines implemented in binary double precision appear less precise than they are. The data, as originally presented, are accurate to only a few digits with the result that F statistics can be calculated only to a few digits. The correction is described below.

Nonlinear regression:: Stata completed all tests. Coefficients averaged 7.8 correct digits and never had fewer than 4.7 correct digits. Standard errors averaged 5.8 correct digits and never had fewer than 3.3 correct digits. Residual sums of squares averaged 10.9 correct digits and never had fewer than 3.0 correct digits.

Detailed results for each of the tests are provided below.

3. Certification results: univariate summary statistics

                                                 Stata
                                           Digits of accuracy
                                        -----------------------
                                                        lag-1
        Test           Difficulty       mean    S.D.   autocorr.
        --------------------------------------------------------
        PiDigits       lower            15.0    15.0       14.9    log   do-file
        Lottery        lower            15.0    15.0       15.0    log   do-file
        Lew            lower            15.0    15.0       14.8    log   do-file
        Mavro          lower            15.0    13.1       13.7    log   do-file
        Michelson      lower            15.0    13.8       13.4    log   do-file
        NumAcc-1       lower            15.0    15.0       15.0    log   do-file
        NumAcc-2       average          15.0    15.0       15.0    log   do-file
        NumAcc-3       average          15.0     9.5       11.9    log   do-file
        NumAcc-4       higher           15.0     8.3       10.7    log   do-file
        --------------------------------------------------------
        Average                         15.0    13.3       13.8
        Minimum                         15.0     8.3       10.7
        Maximum                         15.0    15.0       15.0

4. Certification results: linear regression

                                                 Stata
                                           Digits of accuracy
                                        -----------------------
        Test           Difficulty       Ceof.   S.E.        RSS
        -------------------------------------------------------
        Norris         lower            12.8    13.5       13.3    log   do-file
        Pontius        lower            11.5    13.0       12.7    log   do-file
        NoInt-1        average          14.7    15.0       14.9    log   do-file
        NoInt-2        average          15.0    15.0       14.7    log   do-file
        Filippelli     higher             no full solution(*)      log   do-file
        Longley        higher           12.1    12.9       13.2    log   do-file
        Wampler-1      higher            6.9    15.0       15.0    log   do-file
        Wampler-2      higher           10.4    15.0       15.0    log   do-file
        Wampler-3      higher            6.5    10.8       14.1    log   do-file
        Wampler-4      higher            6.5    10.8       15.0    log   do-file
        Wampler-5      higher            6.4    10.8       15.0    log   do-file
        -----------------------------------------------------------
        Average                         10.3    13.2       14.3
        Minimum                          6.4    10.8       12.7
        Maximum                         15.0    15.0       15.0
        -----------------------------------------------------------

Each test involved multiple independent variables. Reported under Coef. and S.E. is the minimum LRE for all regressors, including the intercept, if any. RSS reports the LRE for the residual (error) sums of squares.

(*) Filippelli test: Stata found the variables so collinear that it dropped two of them—that is, it set two coefficients and standard errors to zero. The resulting estimates still fit the data well. Most other statistical software packages have done the same thing, and most authors have interpreted this result as acceptable for this test. Stata has an orthpoly command that can do this problem, but it would not occur to most users to use it, and transforming results back to the metric of the problem requires an extra statement. However, if that command is used, the LRE for the coefficients is 8.4 and the LRE for the RSS is 8.5.

5. Certification results: analysis of variance

                                                Stata
                                          Digits of accuracy
                                          ------------------
        Test               Difficulty             F              
        ----------------------------------------------------
        Si Resistivity     lower                 13.1          log   do-file
        Simon-Lesage 1     lower                 15.0          log   do-file
        Simon-Lesage 2     lower                 13.6          log   do-file
        Simon-Lesage 3     lower                 12.8          log   do-file
        Ag Atomic Wt       average               10.2          log   do-file
        Simon-Lesage 4     average               10.4          log   do-file
        Simon-Lesage 5     average               10.2          log   do-file
        Simon-Lesage 6     average               10.2          log   do-file
        Simon-Lesage 7     higher                 4.4(*)       log   do-file
                     7b    higher                15.0(*)       log   do-file
        Simon-Lesage 8     higher                 4.3(*)       log   do-file
                     8b    higher                15.0(*)       log   do-file
        Simon-Lesage 9     higher                 4.2(*)       log   do-file
                     9b    higher                15.0(*)       log   do-file
        ----------------------------------------------------
        Average excluding S-L 7, 8, 9            12.8
        Minimum                                  10.2
        Maximum                                  15.0
        ----------------------------------------------------

(*) Tests Simon–Lesage 7b through 9b are a variation developed by Stata on tests Simon–Lesage 7 through 9. To our knowledge, no package that stores and processes data in binary double precision has ever done better than 4.6 on these tests, and that is because it is not possible to do better; the problem is with the test, not the packages being tested. The difficulty is that that data are made different from what the authors intended the instant they are stored on a double-precision binary computer. The test uses y values, such as 1,000,000,000,000.4, but that value immediately becomes 1,000,000,000,000.40002441... because of how computers store numbers. We strongly suspect that the answer Stata produces, and the answers produced by other packages, are correct given the data stored.

Tests Simon–Lesage 7b through 9b are modifications of Simon–Lesage 7 through 9, the difference being that the data are multiplied by 10 before being input, so 1,000,000,000,000.4 becomes 10,000,000,000,004, a number that can be stored with perfect accuracy. The test is then carried through, the question being whether the ANOVA routine can deal with data that varies only in the trailing digits.

6. Certification results: nonlinear regression

                                              Stata
                                        Digits of accuracy
                                      ----------------------
        Test           Difficulty      Coef.   S.E.      RSS
        ----------------------------------------------------
        Misra 1a       lower            9.4     6.4     10.5     log   do-file
        Chwirut 2      lower            8.0     6.3     11.2     log   do-file
        Chwirut 1      lower            7.6     6.3     11.4     log   do-file
        Lanczos 3      lower            7.2     6.0     10.6     log   do-file
        Gauss 1        lower            8.5     6.3     11.6     log   do-file
        Gauss 2        lower            8.2     5.9     10.6     log   do-file
        Daniel Wood    lower            8.6     6.2     11.7     log   do-file
        Misra 1b       lower            9.9     6.5     11.3     log   do-file
        Kirby 2        average          8.0     6.3     11.6     log   do-file
        Hahn 1         average          7.1     5.1     10.6     log   do-file
        Nelson         average          7.1     5.2     10.9     log   do-file
        MGH 17         average         (7.0)   (6.1)   (11.5)    log   do-file
        Lanczos 1      average         10.6     3.3      3.0     log   do-file
        Lanczos 2      average          7.9     5.4     10.1     log   do-file
        Gauss 3        average          8.2     5.5     11.0     log   do-file
        Misra 1c       average          9.7     6.5     11.1     log   do-file
        Misra 1d       average          9.3     6.5     11.2     log   do-file
        Roszman 1      average          7.4     6.4     12.2     log   do-file
        ENSO           average          4.7     5.3     11.3     log   do-file
        MGH 09         higher          (7.0)   (6.5)   (11.6)    log   do-file
        Thurber        higher           6.5     5.4     11.3     log   do-file
        BoxBOD         higher           7.3     6.7     10.4     log   do-file
        Ratkowsky 2    higher           7.6     6.0     11.8     log   do-file
        MGH 10         higher          (7.7)   (4.7)   (11.4)    log   do-file
        Eckerle4       higher          (8.3)   (6.4)   (10.7)    log   do-file
        Ratkowsky 3    higher          (6.0)   (5.0)   (11.4)    log   do-file
        Bennett 5      higher           6.4     5.9     11.0     log   do-file
        ----------------------------------------------------
        Average                         7.8     5.8     10.9
        Minimum                         4.7     3.3      3.0
        Maximum                        10.6     6.7     12.2
        ----------------------------------------------------