Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Stas Kolenikov <skolenik@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: non-central chi-square? |
Date | Sat, 20 Apr 2013 09:43:02 -0500 |
Dirk's solution suffers from a couple of issues having to do with Stata's modus operandi (no functions can be defined in Stata), as well as computational accuracy problem. The first issue is that while there are functions like sin(x) or ln(x) or nchi2(df, x) in Stata, the user cannot define their own functions. The above functions are written into the executable core. With pdf/cdf functions, one is sometimes interested in one specific number (such as critical level), but more often, one needs to compute this on a whole vector (I needed this for my optimization problem that the non-central chi-square was a part of). Starting with version 9 (was it 9 or 10 though?), users have been able to define their functions in Mata, which is an appropriate compiler environment. The second issue is more subtle but more important: calculation of the derivative with fixed step is rarely optimal in terms of accuracy. The first chapter of the [MLE] book, in any edition, goes into a long length of explaining how to compute the derivatives with any reasonable accuracy. Basically, if you compute the difference 1.023456 - 1.010101 = 0.013355, then instead of 7-digit accuracy (float), you have only 5-digit accuracy (and the remaining digits will be filled with essentially random crap). On the other hand, if you take generous enough differences that would retain more digits, you start running into the curvature problems, as now your calculation will be more heavily biased (roughly as [second derivative]*[step size]). So there has to be a balance somewhere, and typically it is achieved at about half digits of precision. In other words, the derivatives of expressions that are represented numerically as doubles (as Stata scalars are) can only be computed with float accuracy. To get a double accuracy, you need to have quad precision, and nchi2 is not available in quad precision. When that goes into the derivative of the likelihood evaluator, you are left with like 4 digits of precision, and that's simply useless. My recollection is that I coded what I needed in Mata in the end, accumulating the terms from the smallest to largest, and had my likelihood evaluator call Mata. Even once I did this, I still had issues as the non-centrality parameter has to be non-negative, and Stata's -moptimize- and -ml- only work well in the interior of parameter space, far enough from the boundaries -- otherwise, it keeps saying "missing values encountered". While the optimizer is terrific at what it does, I really hope to see a constrained optimizer some day, too. (Yeah, I am aware of the tricks like taking logs or squares, but they only sweep the problem under the carpet, as then instead of the missing values in the likelihood error, I am getting (backed up) nearby values are approximate, indicating a major loss of precision). -- Stas Kolenikov, PhD, PStat (SSC) -- Senior Survey Statistician, Abt SRBI -- Opinions stated in this email are mine only, and do not reflect the position of my employer -- http://stas.kolenikov.name On Sat, Apr 20, 2013 at 7:23 AM, Dirk Enzmann <dirk.enzmann@uni-hamburg.de> wrote: > About 2.5 years ago, Stas Kolenikov wrote: > > http://www.stata.com/statalist/archive/2010-11/msg00781.html > > I could not find an answer or solution, therefore I tried the following (it > is not the solution Stas asked for, but - if correct, which I hope so - it > is a quick and not too dirty way around the problem): > > * --- start Stata code -------------------------- > > program define chi2den, rclass > syntax, df(integer) x(real) [ ncp(integer 0) ] > tempname chi2d > sca `chi2d' = (nchi2(`df',`ncp',(1000*`x'+0.05)/1000) - /// > nchi2(`df',`ncp',(1000*`x'-0.05)/1000))*10000 > if `x'==0 & `df'==2 sca `chi2d' = 2*`chi2d' > di `chi2d' > return scalar density = `chi2d' > return scalar ncp = `ncp' > return scalar x = `x' > return scalar df = `df' > end > > * Example (chi² = 4.06, df = 2, ncp = 3): > chi2den, df(2) x(4.06) ncp(3) > > * --- end Stata code ---------------------------- > > If you are looking for the density of a Chi² value of the central Chi² > distribution simply omit the non-centrality parameter option -ncp()-. > > Surely, this can be impoved (but not by me). > > You can use a similar strategy to create density plots of the (noncentral) > Chi² distribution: > > * --- start Stata code -------------------------- > > cap program drop plotchi2den > program define plotchi2den > syntax, df(integer) range(numlist min=2 max=2) /// > [ ncp(integer 0) n(integer 300) * ] > tempname nchi > if `n' > 399 set matsize `=`n'+1' > local from : word 1 of `range' > local to : word 2 of `range' > local interv = (`to'-`from')/`n' > matrix `nchi' = J(`=`n'+1',2,.) > forval i = 0/`n' { > matrix `nchi'[`=`i'+1',1] = /// > (nchi2(`df',`ncp',(`n'*`i'*`interv'+0.5)/`n') - /// > nchi2(`df',`ncp',(`n'*`i'*`interv'-0.5)/`n'))*`n' > if `i'==0 & `df' == 2 matrix `nchi'[1,1] = 2*`nchi'[1,1] > matrix `nchi'[`=`i'+1',2] = `i'*`interv' > } > _matplot `nchi', recast(line) `options' > end > > * Example df = 2, ncp = 3, range(chi²) = 0 - 25: > plotchi2den, df(2) ncp(3) range(0 25) /// > title("Non-Central Chi{superscript: 2} Distribution") /// > ytitle("Density") ylab(, angle(0)) /// > xtitle("Chi{superscript: 2}") /// > note("{it:df} = 2, {it:ncp} = 3") > > * --- end Stata code ---------------------------- > > Of coure, an official (Stata) function -nchi2den- would be better. > > Dirk > > > ======================================== > Dr. Dirk Enzmann > Institute of Criminal Sciences > Dept. of Criminology > Rothenbaumchaussee 33 > D-20148 Hamburg > Germany > > phone: +49-(0)40-42838.7498 (office) > +49-(0)40-42838.4591 (Mrs Billon) > fax: +49-(0)40-42838.2344 > email: dirk.enzmann@uni-hamburg.de > http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html > ======================================== > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/