# Re: st: chi2 - use alternative expected values

 From mcross@exemail.com.au To statalist@hsphsun2.harvard.edu Subject Re: st: chi2 - use alternative expected values Date Sun, 8 Dec 2013 03:42:05 +1100

```Hi Nick,

Just quickly (it's late here).

Your suspicions of me flipping the columns were correct.

The following explains what I'm on about...

clear
tabi 41 30 7 \ 124 62 10 , chi2 expected
scalar pval_1 = r(p)
bysort row : gen prop = .614 if col == 1
bysort row : replace prop = .338 if col == 2
bysort row : replace prop = .048 if col == 3
bysort row : egen rowtot = sum(pop)
gen MyExp = prop * rowtot
gen O_E = pop - MyExp
gen O_E2 = O_E^2
gen X2 = O_E2/MyExp
egen chi2 = sum(X2)
su chi2
di chi2tail(2,`r(mean)')
di pval_1

Thanks and apologies.

Mike.

> For stuff like this, the best advice is normally to use Mata as a
> calculator. But Mata was introduced in Stata 9. Let's go with Mata,
> any way, for folks on 9 up and then give Mike an alternative.
>
> Firing up Mata we have a matrix of frequencies
>
> : f = (41, 30, 7 \ 124, 62, 10)
>
> and a vector of column proportions
>
> : p = (0.048, 0.338, 0.614)
>
> so we can get a matrix of expected frequencies
>
> : fhat = rowsum(f) * p
>
> and Pearson chi-square statistic
>
> : sum((f - fhat):^2  :/ fhat)
>   1903.354724
>
> I like to look at so-called Pearson residuals (to the best of my
> knowledge, first used by Tukey)
>
> : (f - fhat)  :/ sqrt(fhat)
>                   1              2              3
>     +----------------------------------------------+
>   1 |    19.2543253    .7081385267   -5.908903061  |
>   2 |   37.35989483   -.5219130093   -10.05857601  |
>     +----------------------------------------------+
>
> The massive chi-square statistic goes with col 1 much more and col 2
> much less than expected (unless Mike flipped columns) and the P-value
> on 2 df is negligible:
>
> : chi2tail(2, sum((f - fhat):^2  :/ fhat))
>   0
>
> : strofreal(chi2tail(2, sum((f - fhat):^2  :/ fhat)), "%21x")
>   +0.0000000000000X-3ff
>
> : end
>
> Mike could do that with Stata's matrix language, although installing
> Jeroen Weesie's -matsum- from STB would also be a good idea. But
> friendlier is the ancient but still serviceable -chitesti- from
> -tab_chi- (SSC). We ravel the matrix to a vector, but we must tell
> -chitesti- the correct df. If presented with a vector of 6 observed
> and another vector of 6 expected, -chitesti- will think 5 df, so we
> must override that by subtracting 3.
>
> chitesti 41 30  7  124 62 10  \ 78*0.048 78*0.338 78*0.614 196*0.048
> 196*0.338 196*0.614, nfit(3) sep(0)
>
> observed frequencies from keyboard; expected frequencies from keyboard
>
>          Pearson chi2(2) =  1.9e+03   Pr =  0.000
> likelihood-ratio chi2(2) = 758.6395   Pr =  0.000
>
>   +---------------------------------------------------+
>   | observed   expected   notes   obs - exp   Pearson |
>   |---------------------------------------------------|
>   |       41      3.744   *          37.256    19.254 |
>   |       30     26.364               3.636     0.708 |
>   |        7     47.892             -40.892    -5.909 |
>   |      124      9.408             114.592    37.360 |
>   |       62     66.248              -4.248    -0.522 |
>   |       10    120.344            -110.344   -10.059 |
>   +---------------------------------------------------+
>
> *  1 <= expected < 5
>
> . ret li
>
> scalars:
>                   r(k) =  6
>                  r(df) =  2
>                r(chi2) =  1903.354724254806
>                   r(p) =  0
>             r(chi2_lr) =  758.6394519065682
>                r(p_lr) =  1.8345778320e-165
>               r(emean) =  45.66666666666666
>
> Confirmation that the P-value is negligible. Massive rejection, as
> inspection of the original frequencies would suggest.
>
> Nick
> njcoxstata@gmail.com
>
>
> On 7 December 2013 08:17,  <mcross@exemail.com.au> wrote:
>> Hi Folks,
>>
>> A version 8 user, here.
>>
>> Consider the following...
>>
>> tabi 41 30 7 \ 124 62 10 , chi2 expected
>> list
>>
>> Here Stata calculates expected values for each cell, based on the
>> frequency of my observed values (i.e. row_total x col_total /
>> grand_total).
>>
>> However, I have alternative expected values that I'd like to use (I know
>> that frequencies of col 1, 2 and 3 should be 0.048, 0.338 and 0.614,
>> respectively).
>>
>> Can I get Stata to use alternative expected values for the chi2
>> calculation?
>>
>> Cheers,
>>
>> Mike.
>>
>>
>>
```