[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: gologit2

From	Richard Williams <[email protected]>
To	[email protected]
Subject	RE: st: gologit2
Date	Fri, 18 Apr 2008 09:17:36 -0500

For anyone who is still following this thread - Maarten and I have been conversing offline and I think we both agree the original simulations of the brant test and its counterparts were flawed and the test works much better than we initially thought. Here are new simulations. For each simulation, the expected value for reject H0 is .05, i.e. 50 false rejections in 1000 tries using the .05 level of significance.

Brant Simulations, 1000 reps

# of Xs | reject H0
----------+-----------
1 | .051
2 | .042
3 | .054
4 | .051
5 | .057
6 | .054
7 | .048
8 | .056
9 | .050
10 | .056
12 | .067
14 | .038
16 | .048
18 | .041
20 | .036

Altogether, the observed number of false rejections of the null totaled 749; the expected number of rejections was 750, so I'd say the simulations worked pretty good. Most critically, you don't see an increased tendency to reject the null as the number of x variables increases.

For anyone who is interested as to what has changed - in Maarten's original code, as the number of Xs increased, the distribution of Y also changed. Cases were less likely to be in the middle categories of 2 and 3 and more likely to be in the extreme categories of 1 and 4. I don't think this would necessarily be bad, except that it does result in some very thin cell counts in some simulations which probably results in poor estimation.

I tweaked Maarten's code so that the distribution of Y was always the same no matter how many Xs were in the model. This seems more realistic anyway, i.e. the distribution of the observed y shouldn't change just because you have more explanatory variables. That produced the results above. Also, Maarten reports that he tweaked his original code, increasing the sample size, and the brant test also performed better then.

Here is the modified code.

set more off
set seed 12345

capture program drop sim

program define sim, rclass
syntax, [nx(integer 1)]
drop _all
set obs 500
forvalues i = 1/`nx' {
gen x`i' = invnorm(uniform())
local x `x' x`i'
}
local x : list retokenize x
local xsum : subinstr local x " " " + ", all
gen u = uniform()
gen ystar = `xsum' + ln(u/(1-u))
* original code for y. This code causes
* the distribution of y to change as the
* number of x vars increases, probably
* resulting in very small cell counts in
* some simulations.
*gen y = cond(ystar < -2, 1, ///
* cond(ystar < 0, 2, ///
* cond(ystar < 2, 3, 4)))

* New code for y. Distribution for y is
* the same no matter how many Xs there are.
egen y = cut(ystar), group(4)
replace y = y + 1

* Uncomment the section you want
* brant test code
ologit y `x'
brant
return scalar p = r(p)

* omodel test code
*omodel logit y `x'
*return scalar p = chi2tail($S_2, $S_1)

* gologit2 test code
*ologit y `x'
*est store m1
*gologit2 y `x', npl store(m2)
*lrtest m1 m2, force
*return scalar p = r(p)
end

simulate p=r(p), reps(1000): sim, nx(1)
count if p < .05
matrix res = 1, r(N)/1000

foreach i of numlist 2/10 12(2)20 {
simulate p=r(p), reps(1000): sim, nx(`i')
count if p < .05
matrix res = res \ `i', r(N)/1000
}
matlist res

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: gologit2
  - From: "Martin Weiss" <[email protected]>

References:
- RE: st: gologit2
  - From: Maarten buis <[email protected]>
- RE: st: gologit2
  - From: Maarten buis <[email protected]>
- RE: st: gologit2
  - From: David Jacobs <[email protected]>
- RE: st: gologit2
  - From: Richard Williams <[email protected]>

Prev by Date: st: Oaxaca decomposition for beginners
Next by Date: Re: st: stata 64bit for linux or windows
Previous by thread: RE: st: gologit2
Next by thread: RE: st: gologit2
Index(es):
- Date
- Thread