Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: chi2 - use alternative expected values

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: chi2 - use alternative expected values
Date	Sat, 7 Dec 2013 18:05:19 +0000

To see the sum

di r(sum)

would be needed.
Nick
[email protected]


On 7 December 2013 17:53, Nick Cox <[email protected]> wrote:
> You'll get the same answer as when the column probabilities are
> corrected in my code.
>
> Given your definitions
>
> egen chi2 = sum(X2)
> su chi2
> di chi2tail(2,`r(mean)')
>
> can be simplified to
>
> su X2
> di chi2tail(2, r(sum))
>
> In other words, putting the sum of a variable into another variable
> and then taking the mean of that is not needed, when -summarize-
> yields the sum directly, although never as a displayed result.
>
> Nick
> [email protected]
>
>
> On 7 December 2013 16:42,  <[email protected]> wrote:
>> Hi Nick,
>>
>> Just quickly (it's late here).
>>
>> Your suspicions of me flipping the columns were correct.
>>
>> The following explains what I'm on about...
>>
>> clear
>> tabi 41 30 7 \ 124 62 10 , chi2 expected
>> scalar pval_1 = r(p)
>> bysort row : gen prop = .614 if col == 1
>> bysort row : replace prop = .338 if col == 2
>> bysort row : replace prop = .048 if col == 3
>> bysort row : egen rowtot = sum(pop)
>> gen MyExp = prop * rowtot
>> gen O_E = pop - MyExp
>> gen O_E2 = O_E^2
>> gen X2 = O_E2/MyExp
>> egen chi2 = sum(X2)
>> su chi2
>> di chi2tail(2,`r(mean)')
>> di pval_1
>>
>> I'll investigate your solutions.
>>
>> Thanks and apologies.
>>
>> Mike.
>>
>>> For stuff like this, the best advice is normally to use Mata as a
>>> calculator. But Mata was introduced in Stata 9. Let's go with Mata,
>>> any way, for folks on 9 up and then give Mike an alternative.
>>>
>>> Firing up Mata we have a matrix of frequencies
>>>
>>> : f = (41, 30, 7 \ 124, 62, 10)
>>>
>>> and a vector of column proportions
>>>
>>> : p = (0.048, 0.338, 0.614)
>>>
>>> so we can get a matrix of expected frequencies
>>>
>>> : fhat = rowsum(f) * p
>>>
>>> and Pearson chi-square statistic
>>>
>>> : sum((f - fhat):^2  :/ fhat)
>>>   1903.354724
>>>
>>> I like to look at so-called Pearson residuals (to the best of my
>>> knowledge, first used by Tukey)
>>>
>>> : (f - fhat)  :/ sqrt(fhat)
>>>                   1              2              3
>>>     +----------------------------------------------+
>>>   1 |    19.2543253    .7081385267   -5.908903061  |
>>>   2 |   37.35989483   -.5219130093   -10.05857601  |
>>>     +----------------------------------------------+
>>>
>>> The massive chi-square statistic goes with col 1 much more and col 2
>>> much less than expected (unless Mike flipped columns) and the P-value
>>> on 2 df is negligible:
>>>
>>> : chi2tail(2, sum((f - fhat):^2  :/ fhat))
>>>   0
>>>
>>> : strofreal(chi2tail(2, sum((f - fhat):^2  :/ fhat)), "%21x")
>>>   +0.0000000000000X-3ff
>>>
>>> : end
>>>
>>> Mike could do that with Stata's matrix language, although installing
>>> Jeroen Weesie's -matsum- from STB would also be a good idea. But
>>> friendlier is the ancient but still serviceable -chitesti- from
>>> -tab_chi- (SSC). We ravel the matrix to a vector, but we must tell
>>> -chitesti- the correct df. If presented with a vector of 6 observed
>>> and another vector of 6 expected, -chitesti- will think 5 df, so we
>>> must override that by subtracting 3.
>>>
>>> chitesti 41 30  7  124 62 10  \ 78*0.048 78*0.338 78*0.614 196*0.048
>>> 196*0.338 196*0.614, nfit(3) sep(0)
>>>
>>> observed frequencies from keyboard; expected frequencies from keyboard
>>>
>>>          Pearson chi2(2) =  1.9e+03   Pr =  0.000
>>> likelihood-ratio chi2(2) = 758.6395   Pr =  0.000
>>>
>>>   +---------------------------------------------------+
>>>   | observed   expected   notes   obs - exp   Pearson |
>>>   |---------------------------------------------------|
>>>   |       41      3.744   *          37.256    19.254 |
>>>   |       30     26.364               3.636     0.708 |
>>>   |        7     47.892             -40.892    -5.909 |
>>>   |      124      9.408             114.592    37.360 |
>>>   |       62     66.248              -4.248    -0.522 |
>>>   |       10    120.344            -110.344   -10.059 |
>>>   +---------------------------------------------------+
>>>
>>> *  1 <= expected < 5
>>>
>>> . ret li
>>>
>>> scalars:
>>>                   r(k) =  6
>>>                  r(df) =  2
>>>                r(chi2) =  1903.354724254806
>>>                   r(p) =  0
>>>             r(chi2_lr) =  758.6394519065682
>>>                r(p_lr) =  1.8345778320e-165
>>>               r(emean) =  45.66666666666666
>>>
>>> Confirmation that the P-value is negligible. Massive rejection, as
>>> inspection of the original frequencies would suggest.
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 7 December 2013 08:17,  <[email protected]> wrote:
>>>> Hi Folks,
>>>>
>>>> A version 8 user, here.
>>>>
>>>> Consider the following...
>>>>
>>>> tabi 41 30 7 \ 124 62 10 , chi2 expected
>>>> list
>>>>
>>>> Here Stata calculates expected values for each cell, based on the
>>>> frequency of my observed values (i.e. row_total x col_total /
>>>> grand_total).
>>>>
>>>> However, I have alternative expected values that I'd like to use (I know
>>>> that frequencies of col 1, 2 and 3 should be 0.048, 0.338 and 0.614,
>>>> respectively).
>>>>
>>>> Can I get Stata to use alternative expected values for the chi2
>>>> calculation?
>>>>
>>>> Cheers,
>>>>
>>>> Mike.
>>>>
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: chi2 - use alternative expected values
  - From: [email protected]
- Re: st: chi2 - use alternative expected values
  - From: Nick Cox <[email protected]>
- Re: st: chi2 - use alternative expected values
  - From: [email protected]
- Re: st: chi2 - use alternative expected values
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: chi2 - use alternative expected values
Next by Date: st: combining two variables into one with matching type
Previous by thread: Re: st: chi2 - use alternative expected values
Next by thread: st: e(sample) with reg2hdfe
Index(es):
- Date
- Thread