Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: chitesti -- warning -- expected


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: chitesti -- warning -- expected
Date   Wed, 10 Mar 2004 23:02:02 -0000

Good question. 

-chitesti- and its sibling -chitest- are in a package 
-tab_chi- on SSC. The latest public versions of -chitesti- 
and -chitest- are 2.0.0, both from July 2003. 

The immediate command -chitesti- in fact calls -chitest- 
(with a secret handshake indicating keyboard input). 

What happens internally is that the observed and 
expected frequencies are put in -float- variables. 
This is not adequate for your problem to hold all
the digits you want to hold. I make the expected
frequencies 

406694.3598 and 29766.6402

which add to 436441 exactly by virtue of 0.9318 + 
0.0682 equalling 1. However, putting them in a -float- 
and then getting the total yields 
436461.015625. Of course everything is done in binary
and we are just seeing the decimal representation here. 

Here is that difference in hexadecimal: 

. di %21x 436461.015625 
+1.aa3b410000000X+012

. di %21x 436461
+1.aa3b400000000X+012

So near, and yet so far! 

Now -chitest- squawks if the sum of observed and 
the sum of expected differ by more than 0.01 and the 
difference here of 0.015625 qualifies. 

The absolute difference criterion of 0.01 was just 
plucked out of the air when -chitest- was first 
written several years ago. For numbers as big as yours 
a relative difference criterion would presumably 
make more sense. 

Why then is -chitest- telling you both that  
numbers are the same and that they are 
different? "Same" comes from the display
statement, here equivalent to 

. di %8.0g 436461.015625 
  436461 

That format in turn was based on getting 
integers to show as such as far as at all possible, 
without irritating extra ".00000" or whatever. 
The format here loses the small details, however. 

"Different" comes from the numbers held 
in memory, which differ by 0.015625. 

I just rewrote -chitest- and -chitesti- to use
doubles throughout. The results are better: 

. chitesti 314795 121666 \ 0.9318*436461 0.0682*436461

observed frequencies from keyboard; expected frequencies from keyboard

         Pearson chi2(1) =  3.0e+05   Pr =  0.000
likelihood-ratio chi2(1) =  1.8e+05   Pr =  0.000

  +-----------------------------------------------+
  | observed     expected    obs - exp    Pearson |
  |-----------------------------------------------|
  |   314795   406694.360   -91899.360   -144.105 |
  |   121666    29766.640    91899.360    532.657 |
  +-----------------------------------------------+

In short, this is a salutary lesson in precision. The
program author should perhaps read e.g. 

http://www.stata.com/support/faqs/data/mod.html

The defence, if there is one, is that the author 
grew up in a small house in a small country and still
thinks that using -double- where -float- apparently 
will do fine is profligate use of space. 

Incidentally, the chi-square test shows a P-value
indistinguishable from 0. 

Nick 
[email protected] 

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of 
> Benoit Dulong
> Sent: 10 March 2004 21:53
> To: statalist
> Subject: st: chitesti -- warning -- expected
> 
> 
> The command
> chitesti 314795 121666 \ 0.9318*436461 0.0682*436461
> produced
> 
> Chi-square test:
> observed frequencies from keyboard
> expected frequencies from keyboard
> 
> Warning: totals of observed and expected differ
>               total
> observed     436461
> expected     436461
> 
>          Pearson chi2(1) = 304489.6035   Pr =  0.000
> likelihood-ratio chi2(1) = 181321.5938   Pr =  0.000
> 
>                                    residuals
>       observed    expected     classic     Pearson
>   1.    314795  406694.375  -91899.375    -144.105
>   2.    121666   29766.641   91899.359     532.657
> 
> ------------------------------------------------------
> 
> QUESTION-1.
> I do not understand the warning because
> observed and expected do NOT differ ?
> 
> QUESTION-2
> Expected (1) should be 436461*0.9318 = 406694.3598,
> not 406694.375 ?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index