[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Pearson chi square and Rao and Scott correction validity

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: RE: Pearson chi square and Rao and Scott correction validity
Date	Fri, 7 Nov 2008 11:33:23 -0500

Ángel-

Chamber and Skinner also suggest the model-based approach in Chapter7. "Many hypotheses of interests may be expressed as a hypothesisin a log-linear model." It turns out that the Chi Square Pearsonor Likelihood Ratio tests in an I x J table are equivalent tosimultaneous tests that (I-!)(J-1) odds ratios are all equal to 1.In fact, a major competitor to the adjusted Chi square tests, withgood properties in simulations, is doing Bonferroni adjusted tests ofindividual OR's.

I don't think that the logistic and multinomial logistic models are"cumbersome". The command set-up is quite simple( e.g. xi: -mlogit-y i.x ) The test of independence is just the test that the "x"coefficients in all equations are zero.

With the Chi-Square tests you are left only with a single "reject" or"cannot reject" answer. If the test rejects, then why? Modeling cangive the answer. In the IxJ case, if there is any structure ineither factor, you can assess models that are intermediate betweencomplete saturation (IxJ parameters) and independence (I + J -1parameters). And, you can ask whether other factors modify thedegree of association. Stata can fit loglinear models not only with -mlogit- (a special case), but also with -glm- and -poisson-. Inaddition, the contributed commands -ipf- and -mclest- can bedownloaded from SSC.

The original rules of thumb for Chi square tests were criteria forinsuring that the test statistic indeed had a chi squaredistribution. Asymptotics depended on sample size. For therecommended tests in Skinner and Chamber's section 7.3.2, a majordeterminant of the accuracy of tests (level under the nullhypothesis, power) is the number of clusters.


-Steve


On Nov 7, 2008, at 8:35 AM, Ángel Rodríguez Laso wrote:

Reading back Korn and Graubard's (1999) Analysis of Health Surveys,
Wiley, in page 78 they recommend for testing lack of association in a
2 x J contingency table a logistic regression with the
presence/absence of the condition as the dependent variable, and for I
x J contingency tables, a multinomial logit regression. I find this
cumbersome. They consider chi-square statistics inappropriate in
complex surveys, but they do not talk about the Rao and Scott
correction.

I will try to contact Korn or Graubard to see if they can add morelight.


Many thanks,

Angel Rodriguez-Laso


2008/11/7 Steven Samuels <[email protected]>:

I see that in my previous post I confused two issues 1) the sample
size requirements for validity of the survey-adjusted chi square
tests in Stata; 2) sample size requirements for estimates of cell
totals or proportions, with small counts.  Ángel asked about the
first issue. Bottom line: I really don't have an answer.

-Steve

On Nov 6, 2008, at 5:04 PM, Steven Samuels wrote:


I've looked though Chapters 6-7 of Chamber's and Skinner's book
Analysis of Survey Data, Wiley, 2003, but I have no definitive
answer. I do have some thoughts:

* "Expected" count is not a guide in the survey setting--it is a
sum of weights of sample observations in the table cell.

* The accuracy of the second-order Rao-Scott statistic chi square,
probably the best test in -svy: tab-, is apt to depend on the
number of clusters, on the crude counts, and on the distribution of
the observations across clusters. The rule of thumb of 5
observations (or 1) in a cell is based on theory of  i.i.d.
observations that does not hold in the complex survey setting.

* With a small number of events, I ordinarily display only
unweighted numbers and do not reported weighted estimates or
confidence intervals. When I have wanted to infer something about a
proportion based on small outcome count, I've resorted to the
methods on pp. 64-68 of Korn and Graubard (1999) Analysis of Health
Surveys, Wiley.

A quick Google search turned up one survey which would not report a
cell with fewer than 25 observations (http://www.nsf.gov/statistics/
showsrvy.cfm?srvy_CatID=5&srvy_Seri=16) and another in which the
minimum cell size was 4,000! (http://www.phac-aspc.gc.ca/publicat/
cdic-mcc/17-3/a_e.html).

So a guess for Ángel is that not even five observations in table
cell is enough.

-Steve

On Nov 6, 2008, at 7:33 AM, Nick Cox wrote:

There is no need to invoke belief! My -tabchi- and -tabchii-
(programs) from the -tab_chi- package on SSC do indeed give
warnings. (There is no Stata program called tab-chi.)

But these old warnings are very conservative. Many writers now
advise that chi-square works fine so long as all expected
frequencies are above about 1. In any case, the point can be
explored by simulations or bootstrapping. Often it is better to
use Fisher's exact test.

I can't advise on the main issue, which is for svy-savvy people,
but in general very low expected frequencies could be problematic
for any method.

Nick
[email protected]

Ángel Rodríguez Laso


I've been reviewing the manuals and statalist archives and I've

confirmed that Stata does not give any automatic warning messagewhen

requirements for a valid chi-square test are not met (i.e. no more
than 20% of the expected values in a table are less than 5 and none
are less than 1), what I think is a nuisance. I suppose this can be
only worked out by writing the option 'expected' after tabulate and

checking oneself if the requirements are met. I believe Cox'stab-chi

package does give a warning when requirements are not met.

I wonder also if the Rao and Scott correction of Pearson chi-square
that is recommended for survey designs needs the same requirements.
The problem then would be that -svy:tab- doesn't support the
'expected' option neither tab-chi is suitable for survey analysis.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


Steven Samuels
845-246-0774
18 Cantine's Island
Saugerties, NY 12477
EFax: 208-498-7441







*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: RE: Pearson chi square and Rao and Scott correction validity
  - From: Steven Samuels <[email protected]>
- Re: st: RE: Pearson chi square and Rao and Scott correction validity
  - From: "Ángel Rodríguez Laso" <[email protected]>

Prev by Date: Re: st: RE: extract t-values
Next by Date: RE: RE: RE: RE: st: RE: extract t-values
Previous by thread: Re: st: RE: Pearson chi square and Rao and Scott correction validity
Next by thread: st: Double sample selection bias
Index(es):
- Date
- Thread