Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: xtlogit, re for a 5% random sample

From   "Waxman, Daniel" <>
To   "''" <>
Subject   st: xtlogit, re for a 5% random sample
Date   Tue, 12 Jun 2012 01:16:55 +0000


I am interested in fitting a random intercept model of a particular patient outcome, using physicians as the group identifier.  There are ~ 20,000 groups, and an average of ~20 observations per group (range 4-200).  My main parameter of interest is the intra-class correlation coefficient.  

The data is a 5% random sample of Medicare claims at the patient level, meaning that 1 in 20 beneficiaries are included but all claims for included beneficiaries are there.  For most types of analysis with this data, using [pweight=20] would do the trick.   Xtlogit does not, however, allow the use of pweights.  
My questions are:

1.  If I ignore the weights,  will my results be biased, inconsistent, or otherwise suboptimal?

2.  If I can't ignore the weights, can anybody think of some workaround such as expanding the sample, and perhaps assigning outcomes based upon a binomial distribution?  (interestingly, so far if I try even ". expand 20" then xtlogit tends to come back eventually with "initial values not feasible."

3. Gllamm does allow probability weights, but it's such a black box that I hate to use it without understanding what it's doing.  It's also quite slow.  If I were to use this, does anybody know whether the correct procedure would be to assign a probability weight of 20 to "level 1" and a weight of 1 to "level 2"?

4. The ICC is going to be very high, on the order of 80%.  Xtlogit and xtmixed, and gllamm all have a pretty hard time with even simplified versions of the model.  I've tried all of the tricks that I've found on this listserv (e.g. providing starting values using from(), refineopts(iterate(0)), converting continuous variables into categorical variables, etc.).   I think that the problem is that the small amount of intra-group variation just doesn't give the optimizer much to work with.  Given this, does anybody have suggestions for other ways to quantify the % of variation that is attributable to the group (physician) level? 

Thanks in advance!



This email message is for the sole use of the intended recipient(s) and
may contain confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy all copies
of the original message.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index