Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: chi2 - use alternative expected values |

Date |
Sat, 7 Dec 2013 10:08:26 +0000 |

For stuff like this, the best advice is normally to use Mata as a calculator. But Mata was introduced in Stata 9. Let's go with Mata, any way, for folks on 9 up and then give Mike an alternative. Firing up Mata we have a matrix of frequencies : f = (41, 30, 7 \ 124, 62, 10) and a vector of column proportions : p = (0.048, 0.338, 0.614) so we can get a matrix of expected frequencies : fhat = rowsum(f) * p and Pearson chi-square statistic : sum((f - fhat):^2 :/ fhat) 1903.354724 I like to look at so-called Pearson residuals (to the best of my knowledge, first used by Tukey) : (f - fhat) :/ sqrt(fhat) 1 2 3 +----------------------------------------------+ 1 | 19.2543253 .7081385267 -5.908903061 | 2 | 37.35989483 -.5219130093 -10.05857601 | +----------------------------------------------+ The massive chi-square statistic goes with col 1 much more and col 2 much less than expected (unless Mike flipped columns) and the P-value on 2 df is negligible: : chi2tail(2, sum((f - fhat):^2 :/ fhat)) 0 : strofreal(chi2tail(2, sum((f - fhat):^2 :/ fhat)), "%21x") +0.0000000000000X-3ff : end Mike could do that with Stata's matrix language, although installing Jeroen Weesie's -matsum- from STB would also be a good idea. But friendlier is the ancient but still serviceable -chitesti- from -tab_chi- (SSC). We ravel the matrix to a vector, but we must tell -chitesti- the correct df. If presented with a vector of 6 observed and another vector of 6 expected, -chitesti- will think 5 df, so we must override that by subtracting 3. chitesti 41 30 7 124 62 10 \ 78*0.048 78*0.338 78*0.614 196*0.048 196*0.338 196*0.614, nfit(3) sep(0) observed frequencies from keyboard; expected frequencies from keyboard Pearson chi2(2) = 1.9e+03 Pr = 0.000 likelihood-ratio chi2(2) = 758.6395 Pr = 0.000 +---------------------------------------------------+ | observed expected notes obs - exp Pearson | |---------------------------------------------------| | 41 3.744 * 37.256 19.254 | | 30 26.364 3.636 0.708 | | 7 47.892 -40.892 -5.909 | | 124 9.408 114.592 37.360 | | 62 66.248 -4.248 -0.522 | | 10 120.344 -110.344 -10.059 | +---------------------------------------------------+ * 1 <= expected < 5 . ret li scalars: r(k) = 6 r(df) = 2 r(chi2) = 1903.354724254806 r(p) = 0 r(chi2_lr) = 758.6394519065682 r(p_lr) = 1.8345778320e-165 r(emean) = 45.66666666666666 Confirmation that the P-value is negligible. Massive rejection, as inspection of the original frequencies would suggest. Nick njcoxstata@gmail.com On 7 December 2013 08:17, <mcross@exemail.com.au> wrote: > Hi Folks, > > A version 8 user, here. > > Consider the following... > > tabi 41 30 7 \ 124 62 10 , chi2 expected > list > > Here Stata calculates expected values for each cell, based on the > frequency of my observed values (i.e. row_total x col_total / > grand_total). > > However, I have alternative expected values that I'd like to use (I know > that frequencies of col 1, 2 and 3 should be 0.048, 0.338 and 0.614, > respectively). > > Can I get Stata to use alternative expected values for the chi2 calculation? > > Cheers, > > Mike. > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: chi2 - use alternative expected values***From:*mcross@exemail.com.au

**References**:**st: chi2 - use alternative expected values***From:*mcross@exemail.com.au

- Prev by Date:
**st: chi2 - use alternative expected values** - Next by Date:
**st: e(sample) with reg2hdfe** - Previous by thread:
**st: chi2 - use alternative expected values** - Next by thread:
**Re: st: chi2 - use alternative expected values** - Index(es):