Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Joerg Luedicke <joerg.luedicke@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: stset for grouped data |
Date | Fri, 15 Apr 2011 16:15:13 -0400 |
On Fri, Apr 15, 2011 at 10:06 AM, Dherani, Mukesh <M.K.Dherani@liverpool.ac.uk> wrote: > Thanks J. > I am not an statistician but why do we need to log transform population? In the Poisson model above the expected rate is modeled as log(mu/p) where mu is the event count and p is the population size. This model is also known as log-linear model and can be written as log(mu/p)=a+b*x. Since log(mu/p) is equivalent to log(mu) - log(p), by adding log(p) on both sides of the equation the model can also be expressed as log(mu)=a+b*x+log(p). The log(p) term is the offset then, which is introduced into the model as a covariate, usually with the coefficient constrained to 1. > We want to compare the incidence rate (x/person-years) rather than mere incidence. In the example data you showed there are 1) no individually varying exposure times since the data is aggregated (or grouped) already and 2) this constant exposure amounts to exactly 1 year. So if person years were defined as the product of the number of persons in a group and the number of years those persons were exposed to risk, then you would still be left only with the number of persons since number of years is 1. If you had, let's say 3 years of data grouped together you could derive the rate by multiplying the denominator with 3. (Like it is done on this webpage here: http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node75.html) So in case of the data you provided in your OP, there are 2 cases in the youngest age group that had a population size of 5000. So the rate is 2/5000=0.0004 (btw the rate for the oldest age group is 75/7896=0.0095 and 0.0095/0.0004=23.75 which matches the result from the regression) per 100 individuals per year. > Is there any good reference on how to carry out longitudinal analysis on aggregated data? I did not find one on google. Generally speaking, methods for longitudinal data apply in cases of aggregated data, too. Off the top of my head, I don't know of a special reference here. Maybe others may chime in who know better. For a general treatment of longitudinal stuff using Stata, the book by Rabe-Hesketh and Skrondal could be helpful: http://www.stata.com/bookstore/mlmus2.html J. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/