# Re: st: Chi-square test for Categorical Data Analysis

 From Maarten buis <[email protected]> To [email protected] Subject Re: st: Chi-square test for Categorical Data Analysis Date Thu, 20 Sep 2007 13:08:19 +0100 (BST)

```--- Austin Nichols wrote:
> > I think it's fairly easy to prove via counterexample or simulation
> > that this can easily give the wrong answer.  Can you give a
> > referencethat supports it?

--- David Radwin <[email protected]> wrote:
> It is true, of course, as with many statistical techniques, that this
> technique may lead you astray. I have not done any simulations
> myself, but I will refer you again to the reference in my original
> posting:
>
> Parker, R. N., & Fenwick, R. (1983). The Pareto curve and its utility
> for open-ended income distributions in survey research. Social
> Forces, Vol. 61, No. 3, 872-885.
> http://www.jstor.org/view/00377732/di010900/01p0014t/0

In the example below I simulate the the results if we had the
continuous data, use -intreg- with log transformed endpoints, and log
transformed mid-point scaling. All three methods seem to perform ok.
This doesn't mean that midpoint scaling will always be ok, because a) I
created the data to be well behaved and the model to be appropriate for
that data, and b) as Austin already remarked, this method can get very
sensitve to wrongly chosen values of the lowest and highes midpoints
and these midpoints are hardest to choose since they tend to be open
intervals.

Hope this helps,
Maarten

*---------------- begin example ------------------
set seed 12345
capture program drop sim
program define sim, rclass
drop _all
set obs 500
gen x = _n < 251

gen y = exp(.7*invnorm(uniform())+ 11 + .25*x)
bys x: sum y
egen cat = cut(y), ///
at(0, 25000, 50000, 100000, 150000, 200000, 500000, 1e7)
gen cat2 = cat
recode cat2 (     0 =   25000)  ///
( 25000 =   50000)  ///
( 50000 =  100000)  ///
(100000 =  150000)  ///
(150000 =  200000)  ///
(200000 =  500000)  ///
(500000 = 1000000)
gen mid = cat
recode mid  (     0 =   20000)  ///
( 25000 =   37500)  ///
( 50000 =   75000)  ///
(100000 =  125000)  ///
(150000 =  175000)  ///
(200000 =  350000)  ///
(500000 =  750000)

gen lny = ln(y)
gen lncat = ln(cat+1)
gen lncat2 = ln(cat2)
gen lnmid = ln(mid)

reg lny x
return scalar xcont = _b[x]

intreg lncat lncat2 x
return scalar xcat = _b[x]

reg lnmid x
return scalar xmid = _b[x]
end
simulate cont=r(xcont) cat=r(xcat) mid=r(xmid), reps(10000): sim
twoway kdensity cont ||           ///
kdensity cat  ||           ///
kdensity mid,              ///
xline(.25)                 ///
xtitle("effect of x")      ///
ytitle("density")          ///
legend(order(1 "continous" ///
"data"      ///
2 "intreg"    ///
3 "mid point" ///
"scoring"))

sum
*------------------ end example -------------------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )

-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------

___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.