Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: weights in xtlogit


From   Austin Nichols <[email protected]>
To   [email protected]
Subject   Re: st: weights in xtlogit
Date   Sun, 15 Nov 2009 20:18:42 -0500

If you have 1000 obs per cluster, you might prefer to include fixed
effects (50 or so dummies) in regular logit--the bias from using
dummies is quite small when the number of obs per cluster is large.
Also, if you have 50-60 clusters (20-30 per T/C group rather than
20-30 in both groups) you can use the cluster-robust VCE.  The bias
(in SEs) is quite small with more than 50 balanced clusters.  This
should be fairly easy to simulate, since -logit- is fast.  A better
simulation would compare to -xtlogit- of course, but 10000 iterations
of something that takes a day to converge is not feasible... though if
you supply the true parameter vector with option from(b), it might
converge a bit faster.

On Sat, Nov 14, 2009 at 10:58 PM, Airey, David C
<[email protected]> wrote:
> .
>
> Thanks Austin.
>
> I was trying to find ways of speeding up power simulations of xtlogit models where cluster size is large.
>
> If I have 20-30 clusters in control and treatment groups, where each cluster has upwards of 1000 or more 1/0 observations, then xtlogit can be slow.
>
> So variables might be clusterid (animal = cluster), yesno (1 or 0), and treatment (1 or 0).
>
> In 2008 after a similar query, Joseph Coveney posted a version of an xtlogit simulation in a situation where cluster number was small (12) and the cluster size was also not large (50). A slightly modified version of that code is below.
>
> Unfortunately or fortunately, now we have deep (aka next gen) sequencing data which is a technology that gives very larger cluster sizes in our application.
>
> One of our colleagues is looking at modeling this data with an overdispersed count model instead of a mixed logit. That might be a better solution.
>
> // notes follow "//" or "*"
> // modified from Coveney, 2008
> clear all
> set more off // allow results to scroll
> set seed `=date("2009-11-09", "YMD")' // for repeatable simulations
> *
> capture program drop edits // drop program from memory if error
> program define edits, rclass // program to be called in simulation
> version 11 // version 11 of Stata
> syntax [, delta(real 0) rho(real 0)] // input delta and rho
> drop _all
> tempname p_t // define a temporary variable for later
> clear
> set obs 12 // 12 mice...could be 50 now
> generate int pid = _n // integer pid numbered from 1 to 12
> generate byte trt = mod(_n, 2) // alternate 0 and 1
> // Generate a probability
> // using the cluster correlation rho and the standard
> // deviation of the logistic distribution, along with
> // the treatment effect and some normal error per animal.
> generate double prob = invlogit(trt * `delta' + ///
>        sqrt(`rho' / (1 - `rho') * _pi^2 / 3) * rnormal())
> assert !missing(prob) // if rho is misspecified
> generate int nseq = 100 // number of sequences per animal
> // now the number of sequences per animal could be 1000 or more
> gen bnl = rbinomial(nseq, prob) // binomial for (100, prob)
> *
> // ttest on percent summary statistic per animal
> gen percent = bnl/nseq*100
> ttest percent, by(trt)
> scalar define `p_t' = r(p)
> *
> // reshape the data from one score per animal
> // to one binary observation per row
> // so that a mixed logit model can be used...
> rename bnl count1
> generate int count0 = nseq - count1
> reshape long count, i(pid) j(positive) // can weights in xtlogit be used?
> drop if !count
> expand count
> xtlogit positive trt, i(pid) intpoints(30)
> // return actual rho and delta and
> // xtlogit and ttest pvalues for summarization
> return scalar rho = e(rho)
> return scalar p_r = chi2tail(1, e(chi2))
> return scalar delta = _b[trt]
> return scalar p_t = `p_t'
> end // program end
> *
> // call program above for treatment effects 0, 0.5, and 1
> // rho values of 0, 0.05, 0.10, and 0.15
> forvalues delta_int = 0(5)10 {
>        local delta = `delta_int'/10 // delta = 0.0, 0.5, and 1.0
>        forvalues rho_int = 0(5)15 {
>                local rho = `rho_int'/100 // rho = 0.0, 0.05, 0.10, 0.10
>                display in smcl as text _newline(1) ///
>                        "delta = " %03.2f `delta' " rho = " %03.2f `rho'
>                simulate delta=r(delta) rho=r(rho) p_r=r(p_r) ///
>                        p_t=r(p_t), reps(300): ///
>                        edits , rho(`rho') delta(`delta')
>                summarize delta rho
>                foreach var of varlist p_* {
>                        generate byte pos_`var' = `var' < 0.05
>                }
>                summarize pos_*
>        }
> }
> exit
>
>
> -Dave
>
>
>> Does
>> "two groups of clustered observations, but lots of observations per cluster"
>> mean id takes on two distinct values?  And you have many obs in each
>> category?  Just use logit with a dummy variable for id in that case.
>>
>> On Sat, Nov 14, 2009 at 2:16 PM, Airey, David C
>> <[email protected]> wrote:
>> > .
>> >
>> > In logit models can one get faster results using weights if there are few covariate patterns? I thought I remembered reading that somewhere. If so, is it possible to do the same with xtlogit? So if you have the simplest of model, where you have two groups of clustered observations, but lots of observations per cluster, making estimation slow, can weights be used to speed up the estimation?
>> >
>> > xtlogit yn trt, i(id)
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index