Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: chi2 - use alternative expected values
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: chi2 - use alternative expected values
Date
Sat, 7 Dec 2013 17:53:38 +0000
You'll get the same answer as when the column probabilities are
corrected in my code.
Given your definitions
egen chi2 = sum(X2)
su chi2
di chi2tail(2,`r(mean)')
can be simplified to
su X2
di chi2tail(2, r(sum))
In other words, putting the sum of a variable into another variable
and then taking the mean of that is not needed, when -summarize-
yields the sum directly, although never as a displayed result.
Nick
[email protected]
On 7 December 2013 16:42, <[email protected]> wrote:
> Hi Nick,
>
> Just quickly (it's late here).
>
> Your suspicions of me flipping the columns were correct.
>
> The following explains what I'm on about...
>
> clear
> tabi 41 30 7 \ 124 62 10 , chi2 expected
> scalar pval_1 = r(p)
> bysort row : gen prop = .614 if col == 1
> bysort row : replace prop = .338 if col == 2
> bysort row : replace prop = .048 if col == 3
> bysort row : egen rowtot = sum(pop)
> gen MyExp = prop * rowtot
> gen O_E = pop - MyExp
> gen O_E2 = O_E^2
> gen X2 = O_E2/MyExp
> egen chi2 = sum(X2)
> su chi2
> di chi2tail(2,`r(mean)')
> di pval_1
>
> I'll investigate your solutions.
>
> Thanks and apologies.
>
> Mike.
>
>> For stuff like this, the best advice is normally to use Mata as a
>> calculator. But Mata was introduced in Stata 9. Let's go with Mata,
>> any way, for folks on 9 up and then give Mike an alternative.
>>
>> Firing up Mata we have a matrix of frequencies
>>
>> : f = (41, 30, 7 \ 124, 62, 10)
>>
>> and a vector of column proportions
>>
>> : p = (0.048, 0.338, 0.614)
>>
>> so we can get a matrix of expected frequencies
>>
>> : fhat = rowsum(f) * p
>>
>> and Pearson chi-square statistic
>>
>> : sum((f - fhat):^2 :/ fhat)
>> 1903.354724
>>
>> I like to look at so-called Pearson residuals (to the best of my
>> knowledge, first used by Tukey)
>>
>> : (f - fhat) :/ sqrt(fhat)
>> 1 2 3
>> +----------------------------------------------+
>> 1 | 19.2543253 .7081385267 -5.908903061 |
>> 2 | 37.35989483 -.5219130093 -10.05857601 |
>> +----------------------------------------------+
>>
>> The massive chi-square statistic goes with col 1 much more and col 2
>> much less than expected (unless Mike flipped columns) and the P-value
>> on 2 df is negligible:
>>
>> : chi2tail(2, sum((f - fhat):^2 :/ fhat))
>> 0
>>
>> : strofreal(chi2tail(2, sum((f - fhat):^2 :/ fhat)), "%21x")
>> +0.0000000000000X-3ff
>>
>> : end
>>
>> Mike could do that with Stata's matrix language, although installing
>> Jeroen Weesie's -matsum- from STB would also be a good idea. But
>> friendlier is the ancient but still serviceable -chitesti- from
>> -tab_chi- (SSC). We ravel the matrix to a vector, but we must tell
>> -chitesti- the correct df. If presented with a vector of 6 observed
>> and another vector of 6 expected, -chitesti- will think 5 df, so we
>> must override that by subtracting 3.
>>
>> chitesti 41 30 7 124 62 10 \ 78*0.048 78*0.338 78*0.614 196*0.048
>> 196*0.338 196*0.614, nfit(3) sep(0)
>>
>> observed frequencies from keyboard; expected frequencies from keyboard
>>
>> Pearson chi2(2) = 1.9e+03 Pr = 0.000
>> likelihood-ratio chi2(2) = 758.6395 Pr = 0.000
>>
>> +---------------------------------------------------+
>> | observed expected notes obs - exp Pearson |
>> |---------------------------------------------------|
>> | 41 3.744 * 37.256 19.254 |
>> | 30 26.364 3.636 0.708 |
>> | 7 47.892 -40.892 -5.909 |
>> | 124 9.408 114.592 37.360 |
>> | 62 66.248 -4.248 -0.522 |
>> | 10 120.344 -110.344 -10.059 |
>> +---------------------------------------------------+
>>
>> * 1 <= expected < 5
>>
>> . ret li
>>
>> scalars:
>> r(k) = 6
>> r(df) = 2
>> r(chi2) = 1903.354724254806
>> r(p) = 0
>> r(chi2_lr) = 758.6394519065682
>> r(p_lr) = 1.8345778320e-165
>> r(emean) = 45.66666666666666
>>
>> Confirmation that the P-value is negligible. Massive rejection, as
>> inspection of the original frequencies would suggest.
>>
>> Nick
>> [email protected]
>>
>>
>> On 7 December 2013 08:17, <[email protected]> wrote:
>>> Hi Folks,
>>>
>>> A version 8 user, here.
>>>
>>> Consider the following...
>>>
>>> tabi 41 30 7 \ 124 62 10 , chi2 expected
>>> list
>>>
>>> Here Stata calculates expected values for each cell, based on the
>>> frequency of my observed values (i.e. row_total x col_total /
>>> grand_total).
>>>
>>> However, I have alternative expected values that I'd like to use (I know
>>> that frequencies of col 1, 2 and 3 should be 0.048, 0.338 and 0.614,
>>> respectively).
>>>
>>> Can I get Stata to use alternative expected values for the chi2
>>> calculation?
>>>
>>> Cheers,
>>>
>>> Mike.
>>>
>>>
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>>
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/