Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# RE: st: stset for grouped data

 From "Dherani, Mukesh" To "statalist@hsphsun2.harvard.edu" Subject RE: st: stset for grouped data Date Fri, 15 Apr 2011 15:06:15 +0100

```Thanks J.
I am not an statistician but why do we need to log transform population? can you please help me here.
We want to compare the incidence rate (x/person-years) rather than mere incidence.
Is there any good reference on how to carry out longitudinal analysis on aggregated data? I did not find one on google.

thanks.
m

________________________________________
From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] On Behalf Of Joerg Luedicke [joerg.luedicke@gmail.com]
Sent: 14 April 2011 16:04
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: stset for grouped data

On Thu, Apr 14, 2011 at 6:39 AM, Dherani, Mukesh
<M.K.Dherani@liverpool.ac.uk> wrote:
> Thanks. Yes it is aggregated data. What actually I want to do is to calculate the cumulative incidence rate by country.
> Assuming that there is no censoring, and on average each person in a particular age group has same age of onset, can't we calculate incidence rate? We have population by age group hence we may be able to calculate total person years of follow-up.   I may be naive here, I thought if I expanded data based on
> expand cases
> I will get data for an individual and using above assumptions I may be able to calculate the incidence rate.

Maybe I am missing something here, but I would think that your
incidents have constant exposure  ("no censoring", "same age of
onset") and that is not changing if you merely inflate your data. You
should be able to get the incidents rate from the data you have, for
example, by simply dividing the "cases" by "population" (for each age
group). If you want to "test" something and need a model you could
run, for example, a Poisson regression with (logged) population as
offset. If we take your example data and run the Poisson model:

. input region year agegp cases population

region       year      agegp      cases  populat~n
1. 1 1994 4 2 5000
2. 1 1994 9 5 2548
3. 1 1994 14 6 2547
4. 1 1994 19 15 7521
5. 1 1994 24 75 7896
6. end

. gen logpop=log( population)

. list

+-----------------------------------------------------+
region   year   agegp   cases   popula~n     logpop
-----------------------------------------------------
1.       1   1994       4       2       5000   8.517193
2.       1   1994       9       5       2548   7.843064
3.       1   1994      14       6       2547   7.842671
4.       1   1994      19      15       7521   8.925454
5.       1   1994      24      75       7896   8.974112
+-----------------------------------------------------+

. poisson  cases i.agegp, offset( logpop) irr

Iteration 0:   log likelihood =  -19.89126
Iteration 1:   log likelihood = -10.835984
Iteration 2:   log likelihood = -10.235966
Iteration 3:   log likelihood = -10.233162
Iteration 4:   log likelihood = -10.233161

Poisson regression                                Number of obs   =     5
LR chi2(4)      =       84.25
Prob > chi2     =       0.0000
Log likelihood = -10.233161                       Pseudo R2       =     0.8046

cases         IRR   Std. Err.      z    P>z     [95% Conf.      Interval]

agegp
9     4.905808   4.104493     1.90   0.057     .9517967 25.28581
14      5.88928   4.808577     2.17   0.030     1.188664        29.17866
19     4.986039   3.753353     2.13   0.033     1.140235        21.80303
24     23.74619    17.0135     4.42   0.000     5.830841        96.70675
logpop    (offset)

we see, for instance that the incidence rate in the oldest age group
is roughly 24 times as high as in the youngest age group.

However, there still may be better solutions to your problem.

J.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```