Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: non-central chi-square?

From	Stas Kolenikov <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: non-central chi-square?
Date	Sat, 20 Apr 2013 09:43:02 -0500

Dirk's solution suffers from a couple of issues having to do with
Stata's modus operandi (no functions can be defined in Stata), as well
as computational accuracy problem.

The first issue is that while there are functions like sin(x) or ln(x)
or nchi2(df, x) in Stata, the user cannot define their own functions.
The above functions are written into the executable core. With pdf/cdf
functions, one is sometimes interested in one specific number (such as
critical level), but more often, one needs to compute this on a whole
vector (I needed this for my optimization problem that the non-central
chi-square was a part of). Starting with version 9 (was it 9 or 10
though?), users have been able to define their functions in Mata,
which is an appropriate compiler environment.

The second issue is more subtle but more important: calculation of the
derivative with fixed step is rarely optimal in terms of accuracy. The
first chapter of the [MLE] book, in any edition, goes into a long
length of explaining how to compute the derivatives with any
reasonable accuracy. Basically, if you compute the difference 1.023456
- 1.010101 = 0.013355, then instead of 7-digit accuracy (float), you
have only 5-digit accuracy (and the remaining digits will be filled
with essentially random crap). On the other hand, if you take generous
enough differences that would retain more digits, you start running
into the curvature problems, as now your calculation will be more
heavily biased (roughly as [second derivative]*[step size]). So there
has to be a balance somewhere, and typically it is achieved at about
half digits of precision. In other words, the derivatives of
expressions that are represented numerically as doubles (as Stata
scalars are) can only be computed with float accuracy. To get a double
accuracy, you need to have quad precision, and nchi2 is not available
in quad precision. When that goes into the derivative of the
likelihood evaluator, you are left with like 4 digits of precision,
and that's simply useless.

My recollection is that I coded what I needed in Mata in the end,
accumulating the terms from the smallest to largest, and had my
likelihood evaluator call Mata. Even once I did this, I still had
issues as the non-centrality parameter has to be non-negative, and
Stata's -moptimize- and -ml- only work well in the interior of
parameter space, far enough from the boundaries -- otherwise, it keeps
saying "missing values encountered". While the optimizer is terrific
at what it does, I really hope to see a constrained optimizer some
day, too. (Yeah, I am aware of the tricks like taking logs or squares,
but they only sweep the problem under the carpet, as then instead of
the missing values in the likelihood error, I am getting (backed up)
nearby values are approximate, indicating a major loss of precision).

-- Stas Kolenikov, PhD, PStat (SSC)
-- Senior Survey Statistician, Abt SRBI
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer
-- http://stas.kolenikov.name



On Sat, Apr 20, 2013 at 7:23 AM, Dirk Enzmann
<[email protected]> wrote:
> About 2.5 years ago, Stas Kolenikov wrote:
>
> http://www.stata.com/statalist/archive/2010-11/msg00781.html
>
> I could not find an answer or solution, therefore I tried the following (it
> is not the solution Stas asked for, but - if correct, which I hope so - it
> is a quick and not too dirty way around the problem):
>
> * --- start Stata code --------------------------
>
> program define chi2den, rclass
>   syntax, df(integer) x(real) [ ncp(integer 0) ]
>   tempname chi2d
>   sca `chi2d' = (nchi2(`df',`ncp',(1000*`x'+0.05)/1000) - ///
>                  nchi2(`df',`ncp',(1000*`x'-0.05)/1000))*10000
>   if `x'==0 & `df'==2 sca `chi2d' = 2*`chi2d'
>   di `chi2d'
>   return scalar density = `chi2d'
>   return scalar ncp = `ncp'
>   return scalar x = `x'
>   return scalar df = `df'
> end
>
> * Example (chi² = 4.06, df = 2, ncp = 3):
> chi2den, df(2) x(4.06) ncp(3)
>
> * --- end Stata code ----------------------------
>
> If you are looking for the density of a Chi² value of the central Chi²
> distribution simply omit the non-centrality parameter option -ncp()-.
>
> Surely, this can be impoved (but not by me).
>
> You can use a similar strategy to create density plots of the (noncentral)
> Chi² distribution:
>
> * --- start Stata code --------------------------
>
> cap program drop plotchi2den
> program define plotchi2den
>   syntax, df(integer) range(numlist min=2 max=2) ///
>           [ ncp(integer 0) n(integer 300) * ]
>   tempname nchi
>   if `n' > 399 set matsize `=`n'+1'
>   local from : word 1 of `range'
>   local to : word 2 of `range'
>   local interv = (`to'-`from')/`n'
>   matrix `nchi' = J(`=`n'+1',2,.)
>   forval i = 0/`n' {
>     matrix `nchi'[`=`i'+1',1] = ///
>            (nchi2(`df',`ncp',(`n'*`i'*`interv'+0.5)/`n') - ///
>             nchi2(`df',`ncp',(`n'*`i'*`interv'-0.5)/`n'))*`n'
>     if `i'==0 & `df' == 2 matrix `nchi'[1,1] = 2*`nchi'[1,1]
>     matrix `nchi'[`=`i'+1',2] = `i'*`interv'
>   }
>   _matplot `nchi', recast(line) `options'
> end
>
> * Example df = 2, ncp = 3, range(chi²) = 0 - 25:
> plotchi2den, df(2) ncp(3) range(0 25) ///
>    title("Non-Central Chi{superscript: 2} Distribution") ///
>    ytitle("Density") ylab(, angle(0)) ///
>    xtitle("Chi{superscript: 2}") ///
>    note("{it:df} = 2, {it:ncp} = 3")
>
> * --- end Stata code ----------------------------
>
> Of coure, an official (Stata) function -nchi2den- would be better.
>
> Dirk
>
>
> ========================================
> Dr. Dirk Enzmann
> Institute of Criminal Sciences
> Dept. of Criminology
> Rothenbaumchaussee 33
> D-20148 Hamburg
> Germany
>
> phone: +49-(0)40-42838.7498 (office)
>        +49-(0)40-42838.4591 (Mrs Billon)
> fax:   +49-(0)40-42838.2344
> email: [email protected]
> http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html
> ========================================
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: non-central chi-square?
  - From: Dirk Enzmann <[email protected]>

Prev by Date: R: st: R: Treating ind. vars in logit
Next by Date: st: Does -strpos()- return the second occurrence of a string?
Previous by thread: Re: st: non-central chi-square?
Next by thread: st: Treating ind. vars in logit
Index(es):
- Date
- Thread