Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: stset for grouped data

 From Joerg Luedicke To statalist@hsphsun2.harvard.edu Subject Re: st: stset for grouped data Date Fri, 15 Apr 2011 16:15:13 -0400

```On Fri, Apr 15, 2011 at 10:06 AM, Dherani, Mukesh
<M.K.Dherani@liverpool.ac.uk> wrote:
> Thanks J.
> I am not an statistician but why do we need to log transform population?

In the Poisson model above the expected rate is modeled as log(mu/p)
where mu is the event count and p is the population size. This model
is also known as log-linear model and can be written as
log(mu/p)=a+b*x. Since log(mu/p) is equivalent to log(mu) - log(p), by
adding log(p) on both sides of the equation the model can also be
expressed as log(mu)=a+b*x+log(p). The log(p) term is the offset then,
which is introduced into the model as a covariate, usually with the
coefficient constrained to 1.

> We want to compare the incidence rate (x/person-years) rather than mere incidence.

In the example data you showed there are 1) no individually varying
exposure times since the data is aggregated (or grouped) already and
2) this constant exposure amounts to exactly 1 year. So if person
years were defined as the product of the number of persons in a group
and the number of years those persons were exposed to risk, then you
would still be left only with the number of persons since number of
years is 1. If you had, let's say 3 years of data grouped together you
could derive the rate by multiplying the denominator with 3. (Like it
is done on this webpage here:
http://www.stat.ubc.ca/~rollin/teach/643w04/lec/node75.html)

So in case of the data you provided in your OP, there are 2 cases in
the youngest age group that had a population size of 5000. So the rate
is 2/5000=0.0004 (btw the rate for the oldest age group is
75/7896=0.0095 and 0.0095/0.0004=23.75 which matches the result from
the regression)  per 100 individuals per year.

> Is there any good reference on how to carry out longitudinal analysis on aggregated data? I did not find one on google.

Generally speaking, methods for longitudinal data apply in cases of
aggregated data, too. Off the top of my head,  I don't know of a
special reference here. Maybe others may chime in who know better. For
a general treatment of longitudinal stuff using Stata, the book by
Rabe-Hesketh and Skrondal could be helpful:

http://www.stata.com/bookstore/mlmus2.html

J.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```