Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: chi2 - use alternative expected values |

Date |
Sat, 7 Dec 2013 18:05:19 +0000 |

To see the sum di r(sum) would be needed. Nick njcoxstata@gmail.com On 7 December 2013 17:53, Nick Cox <njcoxstata@gmail.com> wrote: > You'll get the same answer as when the column probabilities are > corrected in my code. > > Given your definitions > > egen chi2 = sum(X2) > su chi2 > di chi2tail(2,`r(mean)') > > can be simplified to > > su X2 > di chi2tail(2, r(sum)) > > In other words, putting the sum of a variable into another variable > and then taking the mean of that is not needed, when -summarize- > yields the sum directly, although never as a displayed result. > > Nick > njcoxstata@gmail.com > > > On 7 December 2013 16:42, <mcross@exemail.com.au> wrote: >> Hi Nick, >> >> Just quickly (it's late here). >> >> Your suspicions of me flipping the columns were correct. >> >> The following explains what I'm on about... >> >> clear >> tabi 41 30 7 \ 124 62 10 , chi2 expected >> scalar pval_1 = r(p) >> bysort row : gen prop = .614 if col == 1 >> bysort row : replace prop = .338 if col == 2 >> bysort row : replace prop = .048 if col == 3 >> bysort row : egen rowtot = sum(pop) >> gen MyExp = prop * rowtot >> gen O_E = pop - MyExp >> gen O_E2 = O_E^2 >> gen X2 = O_E2/MyExp >> egen chi2 = sum(X2) >> su chi2 >> di chi2tail(2,`r(mean)') >> di pval_1 >> >> I'll investigate your solutions. >> >> Thanks and apologies. >> >> Mike. >> >>> For stuff like this, the best advice is normally to use Mata as a >>> calculator. But Mata was introduced in Stata 9. Let's go with Mata, >>> any way, for folks on 9 up and then give Mike an alternative. >>> >>> Firing up Mata we have a matrix of frequencies >>> >>> : f = (41, 30, 7 \ 124, 62, 10) >>> >>> and a vector of column proportions >>> >>> : p = (0.048, 0.338, 0.614) >>> >>> so we can get a matrix of expected frequencies >>> >>> : fhat = rowsum(f) * p >>> >>> and Pearson chi-square statistic >>> >>> : sum((f - fhat):^2 :/ fhat) >>> 1903.354724 >>> >>> I like to look at so-called Pearson residuals (to the best of my >>> knowledge, first used by Tukey) >>> >>> : (f - fhat) :/ sqrt(fhat) >>> 1 2 3 >>> +----------------------------------------------+ >>> 1 | 19.2543253 .7081385267 -5.908903061 | >>> 2 | 37.35989483 -.5219130093 -10.05857601 | >>> +----------------------------------------------+ >>> >>> The massive chi-square statistic goes with col 1 much more and col 2 >>> much less than expected (unless Mike flipped columns) and the P-value >>> on 2 df is negligible: >>> >>> : chi2tail(2, sum((f - fhat):^2 :/ fhat)) >>> 0 >>> >>> : strofreal(chi2tail(2, sum((f - fhat):^2 :/ fhat)), "%21x") >>> +0.0000000000000X-3ff >>> >>> : end >>> >>> Mike could do that with Stata's matrix language, although installing >>> Jeroen Weesie's -matsum- from STB would also be a good idea. But >>> friendlier is the ancient but still serviceable -chitesti- from >>> -tab_chi- (SSC). We ravel the matrix to a vector, but we must tell >>> -chitesti- the correct df. If presented with a vector of 6 observed >>> and another vector of 6 expected, -chitesti- will think 5 df, so we >>> must override that by subtracting 3. >>> >>> chitesti 41 30 7 124 62 10 \ 78*0.048 78*0.338 78*0.614 196*0.048 >>> 196*0.338 196*0.614, nfit(3) sep(0) >>> >>> observed frequencies from keyboard; expected frequencies from keyboard >>> >>> Pearson chi2(2) = 1.9e+03 Pr = 0.000 >>> likelihood-ratio chi2(2) = 758.6395 Pr = 0.000 >>> >>> +---------------------------------------------------+ >>> | observed expected notes obs - exp Pearson | >>> |---------------------------------------------------| >>> | 41 3.744 * 37.256 19.254 | >>> | 30 26.364 3.636 0.708 | >>> | 7 47.892 -40.892 -5.909 | >>> | 124 9.408 114.592 37.360 | >>> | 62 66.248 -4.248 -0.522 | >>> | 10 120.344 -110.344 -10.059 | >>> +---------------------------------------------------+ >>> >>> * 1 <= expected < 5 >>> >>> . ret li >>> >>> scalars: >>> r(k) = 6 >>> r(df) = 2 >>> r(chi2) = 1903.354724254806 >>> r(p) = 0 >>> r(chi2_lr) = 758.6394519065682 >>> r(p_lr) = 1.8345778320e-165 >>> r(emean) = 45.66666666666666 >>> >>> Confirmation that the P-value is negligible. Massive rejection, as >>> inspection of the original frequencies would suggest. >>> >>> Nick >>> njcoxstata@gmail.com >>> >>> >>> On 7 December 2013 08:17, <mcross@exemail.com.au> wrote: >>>> Hi Folks, >>>> >>>> A version 8 user, here. >>>> >>>> Consider the following... >>>> >>>> tabi 41 30 7 \ 124 62 10 , chi2 expected >>>> list >>>> >>>> Here Stata calculates expected values for each cell, based on the >>>> frequency of my observed values (i.e. row_total x col_total / >>>> grand_total). >>>> >>>> However, I have alternative expected values that I'd like to use (I know >>>> that frequencies of col 1, 2 and 3 should be 0.048, 0.338 and 0.614, >>>> respectively). >>>> >>>> Can I get Stata to use alternative expected values for the chi2 >>>> calculation? >>>> >>>> Cheers, >>>> >>>> Mike. >>>> >>>> >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>>> * http://www.ats.ucla.edu/stat/stata/ >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >>> >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: chi2 - use alternative expected values***From:*mcross@exemail.com.au

**Re: st: chi2 - use alternative expected values***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: chi2 - use alternative expected values***From:*mcross@exemail.com.au

**Re: st: chi2 - use alternative expected values***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: chi2 - use alternative expected values** - Next by Date:
**st: combining two variables into one with matching type** - Previous by thread:
**Re: st: chi2 - use alternative expected values** - Next by thread:
**st: e(sample) with reg2hdfe** - Index(es):