Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: chi2 - use alternative expected values

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: chi2 - use alternative expected values
Date	Sat, 7 Dec 2013 10:08:26 +0000

For stuff like this, the best advice is normally to use Mata as a
calculator. But Mata was introduced in Stata 9. Let's go with Mata,
any way, for folks on 9 up and then give Mike an alternative.

Firing up Mata we have a matrix of frequencies

: f = (41, 30, 7 \ 124, 62, 10)

and a vector of column proportions

: p = (0.048, 0.338, 0.614)

so we can get a matrix of expected frequencies

: fhat = rowsum(f) * p

and Pearson chi-square statistic

: sum((f - fhat):^2  :/ fhat)
  1903.354724

I like to look at so-called Pearson residuals (to the best of my
knowledge, first used by Tukey)

: (f - fhat)  :/ sqrt(fhat)
                  1              2              3
    +----------------------------------------------+
  1 |    19.2543253    .7081385267   -5.908903061  |
  2 |   37.35989483   -.5219130093   -10.05857601  |
    +----------------------------------------------+

The massive chi-square statistic goes with col 1 much more and col 2
much less than expected (unless Mike flipped columns) and the P-value
on 2 df is negligible:

: chi2tail(2, sum((f - fhat):^2  :/ fhat))
  0

: strofreal(chi2tail(2, sum((f - fhat):^2  :/ fhat)), "%21x")
  +0.0000000000000X-3ff

: end

Mike could do that with Stata's matrix language, although installing
Jeroen Weesie's -matsum- from STB would also be a good idea. But
friendlier is the ancient but still serviceable -chitesti- from
-tab_chi- (SSC). We ravel the matrix to a vector, but we must tell
-chitesti- the correct df. If presented with a vector of 6 observed
and another vector of 6 expected, -chitesti- will think 5 df, so we
must override that by subtracting 3.

chitesti 41 30  7  124 62 10  \ 78*0.048 78*0.338 78*0.614 196*0.048
196*0.338 196*0.614, nfit(3) sep(0)

observed frequencies from keyboard; expected frequencies from keyboard

         Pearson chi2(2) =  1.9e+03   Pr =  0.000
likelihood-ratio chi2(2) = 758.6395   Pr =  0.000

  +---------------------------------------------------+
  | observed   expected   notes   obs - exp   Pearson |
  |---------------------------------------------------|
  |       41      3.744   *          37.256    19.254 |
  |       30     26.364               3.636     0.708 |
  |        7     47.892             -40.892    -5.909 |
  |      124      9.408             114.592    37.360 |
  |       62     66.248              -4.248    -0.522 |
  |       10    120.344            -110.344   -10.059 |
  +---------------------------------------------------+

*  1 <= expected < 5

. ret li

scalars:
                  r(k) =  6
                 r(df) =  2
               r(chi2) =  1903.354724254806
                  r(p) =  0
            r(chi2_lr) =  758.6394519065682
               r(p_lr) =  1.8345778320e-165
              r(emean) =  45.66666666666666

Confirmation that the P-value is negligible. Massive rejection, as
inspection of the original frequencies would suggest.

Nick
[email protected]


On 7 December 2013 08:17,  <[email protected]> wrote:
> Hi Folks,
>
> A version 8 user, here.
>
> Consider the following...
>
> tabi 41 30 7 \ 124 62 10 , chi2 expected
> list
>
> Here Stata calculates expected values for each cell, based on the
> frequency of my observed values (i.e. row_total x col_total /
> grand_total).
>
> However, I have alternative expected values that I'd like to use (I know
> that frequencies of col 1, 2 and 3 should be 0.048, 0.338 and 0.614,
> respectively).
>
> Can I get Stata to use alternative expected values for the chi2 calculation?
>
> Cheers,
>
> Mike.
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: chi2 - use alternative expected values
  - From: [email protected]

References:
- st: chi2 - use alternative expected values
  - From: [email protected]

Prev by Date: st: chi2 - use alternative expected values
Next by Date: st: e(sample) with reg2hdfe
Previous by thread: st: chi2 - use alternative expected values
Next by thread: Re: st: chi2 - use alternative expected values
Index(es):
- Date
- Thread