[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
vwiggins@stata.com (Vince Wiggins, StataCorp) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: How much can we trust Stata's non-linear solver(s)? |

Date |
Thu, 28 Aug 2003 09:57:46 -0500 |

+------+ |Part 1| +------+ Joachim Wagner <wagner@uni-lueneburg.de> has read McCullough and Vinod's American Economic Review (AER) article and is now concerned about the results he gets from statistical software. There haven't been many followup questions or responses, so perhaps everyone on the list just knows that Stata is doing things right, or perhaps there is just not much interest. As someone who does optimization for a living, we find the issues interesting and have quite a bit to say, so here is most of Joachim's question and a rather long response, > at the weekend I read a paper by McCullough and Vinod on "Verifying > the solution from a nonlinear solver: A case study" in the June 2003 > issue of the American Economic Review. For those of you who are not > in economics, this is an absolute top-journal in our discipline, and > the authors are considered to be top-experts in the field. I was > shocked (really, believe me!) to learn how much can go wrong and > goes wrong when one tries to find the ML solution of such day-to-day > problems as the parameters of a probit model using a PC and several > programs. The authors give examples, and they argue that "the > researcher's job is not done when the program reports convergence - > it is only beginning" (p. 876), and ask us to examine the gradient, > inspect the solution path, evaluate the Hessian, including an > eigensystem analysis, and profile the likelihood. Uff. Do I really > have to learn how to do all this? Do I have to use three or more > different packages to compare the solutions? Shall I demand this > when I sit down to write my next referee report later this week? > Must I be aware that the next report I receive will demand me to do > all this? > > Unfortunately, the authors do not name horses and riders (as we say > in Germany - in German, of course), so I have no idea whether Stata > is among the programs one can trust or not. [...] Joachim goes on to ask > From a slightly different perspective, would it be a good idea to > implement the steps suggested by the authors for a > "post-convergence" check in Stata ? > [...] ---------------------- The VERY short answers ---------------------- There are really two main issues taken up by McCullough and Vinod: 1) You may not be able to trust such simple estimators as probit because a forthcoming article by Stokes (2003) got different results from 7 different software packages. Of particular concern, the model and data were taken from a simple example in G.S. Madalla's Introduction to Econometrics text where the reported result was also "wrong". (Note, this is from the data in table 8.4 in the 3rd edition) 2) You should extensively evaluate any maximum likelihood (ML) result because the authors could not reproduce the results from one user-written ML estimator of voter participation, the results of which were published in an earlier AER paper. They tried replication with several packages (though we infer not Stata). Four packages were unable to maximize the likelihood, a 5th was able to maximize to a likelihood that agreed to 11 significant digits with the 6th, and author's "gold standard" result, was obtained using automated analytic derivatives and TSP. They go on to conclude that while the likelihood could be maximized using analytic derivatives, the combination of data and model had insufficient information to draw inferences, mainly because the condition number of the Hessian was too large and because two of the profile likelihoods were deemed insufficiently quadratic in shape. The authors suggest 4 steps to evaluate an estimate. There was a larger issue about replication of results in the social sciences and the difficulty in obtaining the replication data and programs, but that is beyond this post. Short answer to 1. ------------------ Stata performs flawlessly on the probit example. We call the "problem" in this data "perfect prediction". It is treated correctly in Stata and has been for 12 years, it is discussed over 3 pages in [R] probit, and there are even options in -predict- after -probit- for handling such cases -- see option -rules-. Extending this answer to other official estimators -------------------------------------------------- Great care has been taken to ensure that Stata's official estimators have good optimization behaviour. A few of the things that are routinely done include: identify and correct problems in the data, get good starting values, transform the metric in which parameters are estimated, maximize concentrated likelihoods, use alternate optimization methods or alternate model realizations. For almost all official estimators this is enough to ensure that your estimates are stable and robust to virtually all datasets. Even so, optimization can be hard and some likelihoods are particularly difficult. For official estimators with difficult likelihoods, e.g. -heckman-, -arch-, -xtprobit-, -boxcox-, the difficulty is discussed in the manual entry and suggestions are made or tools are provided to help diagnose when convergence may not have been achieved, and to evaluate the stability of the estimates (e.g. -quadchk-). Answer to 2 (OK, not that short) -------------------------------- Writing ML estimators can be tricky and we encourage all those who write estimators to look carefully at their results. The authors' four suggested checks strike me as a fine beginning with two and almost three of these easily automated by specifying options to -ml-. More importantly, always simulate some data with known coefficients and run your estimator on it so you can check all of your computations. When we wrote the user-programmed ML estimator (estimating the parameters of a model of voter participation) in Stata (using method lf) and turned our estimator loose on the data graciously provided by Bruce McCullough, it marched straight to a solution in 8 iterations. Given the difficulties encountered by the authors, we expected there to be many more iterations, some with "not concave" messages and possibly some missing or very large standard errors in the regression table, the latter indicative of a poorly conditioned covariance matrix. We saw none of these things. The log-likelihood of our estimates agreed with those reported as the "gold standard" by the authors to "only" 8 significant digits. This did not surprise or bother us -- we benchmark ML estimators on about 15 different platforms where there are differences in operating systems, math libraries, and CPUs. We would not be happy with 8 digits, but that is because we control the algorithms. The theoretically best that can be hoped for is to agree to 15 digits (sometimes 16), but that is never achieved. Even changing from an Intel Celeron to an Intel Pentium will often lead to numbers that agree at "only" the 13th or 14th decimal point. With the current problem we are talking about different computers, different operating systems (we ran under Linux), different math libraries, and different optimization algorithms. The important question is do the solutions support the same statistical inferences and that we did agree with the "gold standard"? Both packages reported from 4 to 8 significant digits for the coefficient and standard error (SE) estimates. When rounded to the number of digits reported, all of SEs agreed exactly and all but two of the coefficients agreed exactly, with the two coefficients still agreeing to 5 significant digits. McCullough and Vinod do not report SEs for any of the estimators because their eigenvalue analysis of the hessian could not clearly establish that the VCE was of full rank. They are particularly concerned with the condition number of the hessian and what it implies about the accuracy of the eigenvalues and by inference about the rank of the VCE. While a smaller condition number would be nice, our results indicate that the parameter and SE estimates for this model are just fine, which casts some doubt on the utility of using the condition number when deciding whether you have sufficient information for inference. Two results are particularly telling, B1. You can easily increase or decrease the condition number by scaling one or more variables in the model. We tested scalings that decreased the condition number (this should be better) down to 2.4e5 from the original scalings 6.5e9 and also up to 5.2e10. In all cases the coefficients remained the same to the number of digits shown in output, of course they were scaled. More importantly, the z-statistics and associated p-values remained the same regardless of the scaling and implied condition number. B2. We bootstrapped our ML estimator using 1000 replicates and importantly had no trouble with convergence in any of the 1000 replicates. Bootstrapped standard errors and confidence intervals were generally in agreement with their asymptotic analogues reported by the ML estimator. They were different, and in two cases marginally significant regressors became marginally insignificant, but on the whole they were similar. McCullough and Vinod make 4 specific suggestions: C1. Examine the gradient -- is it zero? C2. Inspect the solution path (i.e. trace) -- does it exhibit the expected rate of convergence? <that is to say quadratic> C3. Evaluate the Hessian, including an eigensystem analysis -- is the Hessian negative definite? Is it well-conditioned? C4. Profile the likelihood to assess the adequacy of the quadratic approximation. These are all good suggestions, though there is no clear-cut way to establish a well-conditioned hessian. That was the point of item B1 above. There is also some overlap. Suggestions C2 and C4 both have something to say about the shape of the hessian in the neighborhood of the solution, though C4 is more specific. We claimed earlier that two and almost three of these suggestions can be automated by Stata. Specifying the -nrtolerance(#)- option on most official estimators or on -ml maximize- for user-written estimators specifies that the hessian scaled gradient -1 tol = gH g' be less than #. This implies that the hessian must be positive definite when convergence is declared, because the full inverse is required to compute the tolerance. It also implies that the gradient is 0 within our ability to define 0 using the shape of the likelihood -- the hessian. We have been considering making -nrtolerance()- the default tolerance for Stata's official estimators. A good value for # is 1e-6. Alternately, specifying the -gradient- option on most estimators or on -ml maximize- will cause -ml- to display the gradient at each iteration. Suggestion C3 can be obtained for most official estimators and all user-programmed estimators by typing -ml graph- after estimation. A graph tracing the last 20 iterations of the log likelihood is show. These values are also available in the matrix e(ilog). The profile likelihood, suggestion C4, currently requires coding a loop over the values of the coefficient to be profiled and using either the -constraint()- or -offset()- options to lock one of the coefficient values. Profile likelihoods are just one of many ways to evaluate the validity of asymptoticly justified statistics, and one of the least direct. More direct options include bootstrapping when evaluating performance on specific samples or Monte Carlo simulation when evaluating an estimator in general. What do we suggest? If you are using an official estimator: The asymptotic properties of the SEs and CIs reported by official estimators have already been established, otherwise Stata would not report them. If you are concerned with the small-sample performance of an official estimator on your specific data, bootstrap and, if you are worried about asymmetry, look particularly at the percentile CI. If you are programming your own estimator: Simulate some data from you statistical model and use that to check your estimator. This isn't really related to convergence, though you could use it to establish correct coverage for the null, but you can be much surer that you have coded you likelihood correctly. Use the -nrtolerance()- option and look carefully at your iteration log. If you have many "not concave" messages near the end of you iteration log, you may have insufficient information to estimate the model. If you are worried that you data is insufficient to support asymptotic statistics, bootstrap. If you are failing to converge, look into rescaling some of the variables. <See Part 2 that is in another post> -- Vince -- David vwiggins@stata.com ddrukker@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**[no subject]** - Next by Date:
**RE: st: SAS to Stata do instruction** - Previous by thread:
**[no subject]** - Next by thread:
**Re: st: How much can we trust Stata's non-linear solver(s)?** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |