You'll get the same answer as when the column probabilities are corrected in my code. Given your definitions egen chi2 = sum(X2) su chi2 di chi2tail(2,`r(mean)') can be simplified to su X2 di chi2tail(2, r(sum)) In other words, putting the sum of a variable into another variable and then taking the mean of that is not needed, when -summarize- yields the sum directly, although never as a displayed result. Nick njcoxstata@gmail.com On 7 December 2013 16:42, <mcross@exemail.com.au> wrote: > Hi Nick, > > Just quickly (it's late here). > > Your suspicions of me flipping the columns were correct. > > The following explains what I'm on about... > > clear > tabi 41 30 7 \ 124 62 10 , chi2 expected > scalar pval_1 = r(p) > bysort row : gen prop = .614 if col == 1 > bysort row : replace prop = .338 if col == 2 > bysort row : replace prop = .048 if col == 3 > bysort row : egen rowtot = sum(pop) > gen MyExp = prop * rowtot > gen O_E = pop - MyExp > gen O_E2 = O_E^2 > gen X2 = O_E2/MyExp > egen chi2 = sum(X2) > su chi2 > di chi2tail(2,`r(mean)') > di pval_1 > > I'll investigate your solutions. > > Thanks and apologies. > > Mike. > >> For stuff like this, the best advice is normally to use Mata as a >> calculator. But Mata was introduced in Stata 9. Let's go with Mata, >> any way, for folks on 9 up and then give Mike an alternative. >> >> Firing up Mata we have a matrix of frequencies >> >> : f = (41, 30, 7 \ 124, 62, 10) >> >> and a vector of column proportions >> >> : p = (0.048, 0.338, 0.614) >> >> so we can get a matrix of expected frequencies >> >> : fhat = rowsum(f) * p >> >> and Pearson chi-square statistic >> >> : sum((f - fhat):^2 :/ fhat) >> 1903.354724 >> >> I like to look at so-called Pearson residuals (to the best of my >> knowledge, first used by Tukey) >> >> : (f - fhat) :/ sqrt(fhat) >> 1 2 3 >> +----------------------------------------------+ >> 1 | 19.2543253 .7081385267 -5.908903061 | >> 2 | 37.35989483 -.5219130093 -10.05857601 | >> +----------------------------------------------+ >> >> The massive chi-square statistic goes with col 1 much more and col 2 >> much less than expected (unless Mike flipped columns) and the P-value >> on 2 df is negligible: >> >> : chi2tail(2, sum((f - fhat):^2 :/ fhat)) >> 0 >> >> : strofreal(chi2tail(2, sum((f - fhat):^2 :/ fhat)), "%21x") >> +0.0000000000000X-3ff >> >> : end >> >> Mike could do that with Stata's matrix language, although installing >> Jeroen Weesie's -matsum- from STB would also be a good idea. But >> friendlier is the ancient but still serviceable -chitesti- from >> -tab_chi- (SSC). We ravel the matrix to a vector, but we must tell >> -chitesti- the correct df. If presented with a vector of 6 observed >> and another vector of 6 expected, -chitesti- will think 5 df, so we >> must override that by subtracting 3. >> >> chitesti 41 30 7 124 62 10 \ 78*0.048 78*0.338 78*0.614 196*0.048 >> 196*0.338 196*0.614, nfit(3) sep(0) >> >> observed frequencies from keyboard; expected frequencies from keyboard >> >> Pearson chi2(2) = 1.9e+03 Pr = 0.000 >> likelihood-ratio chi2(2) = 758.6395 Pr = 0.000 >> >> +---------------------------------------------------+ >> | observed expected notes obs - exp Pearson | >> |---------------------------------------------------| >> | 41 3.744 * 37.256 19.254 | >> | 30 26.364 3.636 0.708 | >> | 7 47.892 -40.892 -5.909 | >> | 124 9.408 114.592 37.360 | >> | 62 66.248 -4.248 -0.522 | >> | 10 120.344 -110.344 -10.059 | >> +---------------------------------------------------+ >> >> * 1 <= expected < 5 >> >> . ret li >> >> scalars: >> r(k) = 6 >> r(df) = 2 >> r(chi2) = 1903.354724254806 >> r(p) = 0 >> r(chi2_lr) = 758.6394519065682 >> r(p_lr) = 1.8345778320e-165 >> r(emean) = 45.66666666666666 >> >> Confirmation that the P-value is negligible. Massive rejection, as >> inspection of the original frequencies would suggest. >> >> Nick >> njcoxstata@gmail.com >> >> >> On 7 December 2013 08:17, <mcross@exemail.com.au> wrote: >>> Hi Folks, >>> >>> A version 8 user, here. >>> >>> Consider the following... >>> >>> tabi 41 30 7 \ 124 62 10 , chi2 expected >>> list >>> >>> Here Stata calculates expected values for each cell, based on the >>> frequency of my observed values (i.e. row_total x col_total / >>> grand_total). >>> >>> However, I have alternative expected values that I'd like to use (I know >>> that frequencies of col 1, 2 and 3 should be 0.048, 0.338 and 0.614, >>> respectively). >>> >>> Can I get Stata to use alternative expected values for the chi2 >>> calculation? >>> >>> Cheers, >>> >>> Mike. >>> >>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ >> > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

