  | 
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: gologit2
For anyone who is still following this thread - Maarten and I have 
been conversing offline and I think we both agree the original 
simulations of the brant test and its counterparts were flawed and 
the test works much better than we initially thought.  Here are new 
simulations.  For each simulation, the expected value for reject H0 
is .05, i.e. 50 false rejections in 1000 tries using the .05 level of 
significance.
Brant Simulations, 1000 reps
# of Xs   | reject H0
----------+-----------
        1 |      .051
        2 |      .042
        3 |      .054
        4 |      .051
        5 |      .057
        6 |      .054
        7 |      .048
        8 |      .056
        9 |      .050
       10 |      .056
       12 |      .067
       14 |      .038
       16 |      .048
       18 |      .041
       20 |      .036
Altogether, the observed number of false rejections of the null 
totaled 749; the expected number of rejections was 750, so I'd say 
the simulations worked pretty good.  Most critically, you don't see 
an increased tendency to reject the null as the number of x variables 
increases.
For anyone who is interested as to what has changed - in Maarten's 
original code, as the number of Xs increased, the distribution of Y 
also changed.  Cases were less likely to be in the middle categories 
of 2 and 3 and more likely to be in the extreme categories of 1 and 
4.  I don't think this would necessarily be bad, except that it does 
result in some very thin cell counts in some simulations which 
probably results in poor estimation.
I tweaked Maarten's code so that the distribution of Y was always the 
same no matter how many Xs were in the model.  This seems more 
realistic anyway, i.e. the distribution of the observed y shouldn't 
change just because you have more explanatory variables.  That 
produced the results above.  Also, Maarten reports that he tweaked 
his original code, increasing the sample size, and the brant test 
also performed better then.
Here is the modified code.
set more off
set seed 12345
capture program drop sim
program define sim, rclass
        syntax, [nx(integer 1)]
        drop _all
        set obs 500
        forvalues i = 1/`nx' {
                gen x`i' = invnorm(uniform())
                local x `x' x`i'
        }
        local x : list retokenize x
        local xsum : subinstr local x " " " + ", all
        gen u = uniform()
        gen ystar = `xsum' + ln(u/(1-u))
        * original code for y.  This code causes
        * the distribution of y to change as the
        * number of x vars increases, probably
        * resulting in very small cell counts in
        * some simulations.
        *gen y = cond(ystar < -2, 1,     ///
        *  cond(ystar <  0, 2,     ///
        *        cond(ystar <  2, 3, 4)))
        * New code for y.  Distribution for y is
        * the same no matter how many Xs there are.
        egen y = cut(ystar), group(4)
        replace y = y + 1
        * Uncomment the section you want
        * brant test code
        ologit y `x'
        brant
        return scalar p = r(p)
        * omodel test code
        *omodel logit y `x'
        *return scalar p = chi2tail($S_2, $S_1)
        * gologit2 test code
        *ologit y `x'
        *est store m1
        *gologit2 y `x', npl store(m2)
        *lrtest m1 m2, force
        *return scalar p = r(p)
end
simulate p=r(p), reps(1000): sim, nx(1)
count if p < .05
matrix res = 1, r(N)/1000
foreach i of numlist 2/10 12(2)20 {
        simulate p=r(p), reps(1000): sim, nx(`i')
        count if p < .05
        matrix res = res \ `i', r(N)/1000
}
matlist res
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/