Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Rebecca Pope <rebecca.a.pope@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: "pooled" xtmepoisson with unconstrained error variance |
Date | Sun, 25 Nov 2012 16:59:02 -0600 |
Hello, I need to estimate a Poisson model for two groups with unequal variance where the data is comes from observations on patients over time nested within clinics (i.e. level 1 is time (measurement occasion), level 2 is the patient, and level 3 is the clinic). I am using Stata 12.1. I think -xtmepoisson- is a natural choice for the analysis, except that for the time being I'm stuck estimating separate equations for each group. In the interest of fixing terms, by "pooled" I mean that I've taken a separate equation for each group and written them as one "master" equation. Bill Gould discusses something similar to my problem in the linear regression context at: http://www.stata.com/support/faqs/statistics/pooling-data-and-chow-tests/. aweights and -xtglm- are discussed, but neither is applicable in this context so I'm turning to the Statalist for assistance. Put as concisely as I can my questions are: 1. Is unequal variance between groups as much of a problem in Poisson models as in linear regression? (Clearly I think "yes" or I wouldn't be posting, but I'd like to verify with more expert folks than me). 2a. Can I control for this in a "pooled" multilevel Poisson model (in Stata)? 2b. How do I control for unequal variance in a pooled multilevel Poisson model in Stata? Here is an example that resembles my problem. Assume for the sake of argument that a group*age interaction is somehow meaningful and interesting in this context. *** begin example *** use http://www.stata-press.com/data/r12/epilepsy /* create artificial groups, 1 for odd ID number, 0 for even */ gen foo = ceil((subject/2)-int(subject/2)) /* demonstrate baseline differences in variances by group */ by subject, sort: gen first=_n==1 sdtest seizures if first, by(foo) /* significant at alpha=0.10, in actual data, p < 0.001 */ /* -xtmepoisson- model from manual for each group (1) */ by foo, sort : xtmepoisson seizures treat lage lbas lbas_trt v4, || subject: /* -xtmepoisson- with interactions for covariate of interest (2) */ xtmepoisson seizures treat lage##i.foo lbas lbas_trt v4, || subject: /* -xtmepoisson- fully interacted (3) (will switch to Laplace here by default) */ gen cons0=foo==0 xtmepoisson seizures cons0 i.foo##i.treat c.lage##i.foo c.lbas##i.foo c.lbas_trt##i.foo c.v4##i.foo, nocons || subject: R.foo *** end example *** (3) seems to me to be clearly preferred to (2) because it recovers all FEs from (1) though the estimates are not exact. I tried Laplace in both and it didn't make a difference, which from the manual should have been expected. Am I on the right track with this progression? How do I accommodate the fact that the variance in number of seizures differs by "foo"? In case the following is relevant to anyone's recommendations: - The example above only has 59 patients; I have several of thousand. - I do not have an equal number of patients in each group; there is about a 3:1 ratio of 0s to 1s for my comorbidity indicator. - The data is observational. It comes from medical records review. - There are about 30 coefficients to be estimated before any interactions/REs. - There is no randomly assigned treatment, just a set of 3 covariates that I am interested in testing whether they are jointly different between the two groups. - The example data doesn't have a natural level 3 variable, but I have a random intercept for the clinic also. Related econometric references are welcomed just as much as Stata tips because I'd really like to learn more about this. I've tried searching with the terms "pooling Poisson multilevel mixed effects" and various combinations thereof and haven't found anything that addresses the use of pooled data in a Poisson regression let alone the issue of unequal variances. * I'm not sure if the use of R.foo is correct for the RE in model (3). It is my best guess for now & I intend to do more reading on that later. Thanks, Rebecca * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/