**sort**is much faster.**collapse**is much much faster.- MKL-powered Mata functions and operators are faster.
- Mixed models are faster.
**import delimited**is now parallelized in Stata/MP.

I just got my Stata 17 and I see huge improvements. I am using AMD Ryzen 7 4800H with 40GbRam. On my Stata 16.1 MP8, my data with 44.7 Million obs used to take 30 seconds to sort; now on stata 17 MP8, it is taking 16.1 seconds to sort. I also ran [a] few other commands where I manipulate the data. Stata 16.1 used to take 8 minutes to complete that task; now Stata 17 takes 3 minutes to complete similar tasks. I am not sure how Intel will perform with the new "Intel Math Kernel Library (MKL)" update. However, I am very happy with the new update.

— Ahmed Khan

Ph.D. scholar at University of Waikato School of Accounting, Finance, and Economics

Stata values accuracy and speed. There is often a tradeoff between the two, but Stata strives to give users the best of both worlds. We are continuously optimizing and improving our routines to utilize modern computing power and algorithms so that Stata runs even faster.

In Stata 17, we updated the algorithms behind **sort** and **collapse** to make these commands faster. Much faster. Because the **sort** command is used by many other Stata commands, these commands, too, are faster. **sort** is somewhere between 1.5 and 6 times faster, as is shown in Table 1, below. For example, with 10 million observations and 20 variables, timings dropped to close to 3 seconds in Stata/SE 17 from close to 20 seconds in Stata/SE 16!

Mean timings in seconds | |||
---|---|---|---|

Observations and edition | Stata 17 | Stata 16 | Speedup |

10,000 | |||

SE | 0.08 | 0.35 | 4.42 |

MP4 | 0.07 | 0.14 | 2.02 |

MP8 | 0.06 | 0.10 | 1.79 |

100,000 | |||

SE | 0.14 | 0.54 | 3.75 |

MP4 | 0.10 | 0.23 | 2.36 |

MP8 | 0.08 | 0.16 | 1.97 |

1,000,000 | |||

SE | 0.25 | 0.77 | 3.14 |

MP4 | 0.16 | 0.44 | 2.83 |

MP8 | 0.14 | 0.32 | 2.54 |

10,000,000 | |||

SE | 3.34 | 19.76 | 5.92 |

MP4 | 2.06 | 6.90 | 3.35 |

MP8 | 1.89 | 5.50 | 2.91 |

Timings run in Windows 10 on a computer with an i9-9900KS processor at 4.00GHz and 64GB RAM |

The **collapse** command creates a dataset of summary statistics and is one of the most commonly-used data management commands. As the size of the data grows, so necessarily does the runtime. In Stata 17, depending on dataset size, **collapse** sees speedups of between 6 and 13 times for computation of a simple mean and between 40 and 70 times for computation of statistics like medians and standard deviations. Table 2 shows the results for collapsing a dataset with 10,000,000 observations and varying numbers of collapsed variables for the case where we compute medians and standard deviations.

Variables and edition | Stata 17 | Stata 16 | Speedup |
---|---|---|---|

10 | |||

SE | .3412143 | 13.96871 | 40.96581 |

MP4 | .23 | 16.39493 | 71.29675 |

MP8 | .2091429 | 13.41664 | 64.17162 |

100 | |||

SE | .3068571 | 13.86514 | 45.1849d |

MP4 | .2205714 | 16.06886 | 72.86166 |

MP8 | .196 | 13.41314 | 68.43816 |

1,000 | |||

SE | .3437143 | 13.994 | 40.73298 |

MP4 | .2277143 | 16.34614 | 71.79339 |

MP8 | .2117143 | 13.39286 | 63.26852 |

10,000 | |||

SE | .3392857 | 13.92886 | 41.09007 |

MP4 | .2287143 | 16.149 | 70.61243 |

MP8 | .207 | 13.36543 | 64.58582 |

100,000 | |||

SE | .3177143 | 13.97943 | 44.03442 |

MP4 | .224 | 16.22057 | 72.43024 |

MP8 | .1944286 | 13.38586 | 68.85059 |

Timings run in Windows 10 on a computer with an i9-9900KS processor at 4.00GHz and 64GB RAM |

For Stata 17, we also attained speed improvements for estimation. The Linear Algebra Package (LAPACK) underlying many of Mata's functions and operators is now powered by Intel Math Kernel Library (MKL). How much faster is the new MKL? Multiplying a 5,000-by-5,000 real matrix in Stata/SE with a real matrix of the same dimension takes about 13 seconds using MKL in Stata 17 compared with 70 seconds in Stata 16.

Timing of multiplication of two real matrices in seconds:

Edition | Size | MKL | non-MKL |
---|---|---|---|

MP8 | 5,000 by 5,000 | 2.55 | 10.26 |

MP8 | 10,000 by 10,000 | 17.28 | 85.60 |

MP4 | 5,000 by 5,000 | 3.62 | 15.95 |

MP4 | 10,000 by 10,000 | 28.22 | 127.24 |

SE | 5,000 by 5,000 | 13.64 | 70.61 |

SE | 10,000 by 10,000 | 108.33 | 566.99 |

Timings run in Windows 10 on a computer with an i9-9900KS processor at 4.00GHz and 64GB RAM |

Timing of **cholesky()** in seconds:

Edition | Size | MKL | non-MKL |
---|---|---|---|

MP8 | 5,000 by 5,000 | 0.42 | 16.69 |

MP8 | 10,000 by 10,000 | 2.91 | 133.60 |

MP4 | 5,000 by 5,000 | 0.69 | 16.69 |

MP4 | 10,000 by 10,000 | 5.03 | 133.70 |

SE | 5,000 by 5,000 | 2.41 | 18.62 |

SE | 10,000 by 10,000 | 16.66 | 133.63 |

Timings run in Windows 10 on a computer with an i9-9900KS processor at 4.00GHz and 64GB RAM |

LAPACK is used in computations by many estimation commands, so they are automatically faster too.

The **import delimited** command for importing data from CSV and other delimited text files is now parallelized in Stata/MP. It imports large datasets up to four times faster in Stata 17.

Last, but not least, the **mixed** command for fitting multilevel mixed-effects models is faster. In our timings, models with 10,000 panels, 10 time periods, and 5 random slope parameters run 2 to 3 times faster in Stata 17 than in Stata 16. Similar speed improvements occurred for different numbers of panels, time periods, and slope coefficients.

We continuously look for ways to make Stata faster. We actively investigate, code, and test new algorithms in data management and estimation routines, and we will keep you informed of the latest developments.