Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: stset for grouped data


From   Joerg Luedicke <joerg.luedicke@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: stset for grouped data
Date   Thu, 14 Apr 2011 11:04:36 -0400

On Thu, Apr 14, 2011 at 6:39 AM, Dherani, Mukesh
<M.K.Dherani@liverpool.ac.uk> wrote:
> Thanks. Yes it is aggregated data. What actually I want to do is to calculate the cumulative incidence rate by country.
> Assuming that there is no censoring, and on average each person in a particular age group has same age of onset, can't we calculate incidence rate? We have population by age group hence we may be able to calculate total person years of follow-up.   I may be naive here, I thought if I expanded data based on
> expand cases
> I will get data for an individual and using above assumptions I may be able to calculate the incidence rate.

Maybe I am missing something here, but I would think that your
incidents have constant exposure  ("no censoring", "same age of
onset") and that is not changing if you merely inflate your data. You
should be able to get the incidents rate from the data you have, for
example, by simply dividing the "cases" by "population" (for each age
group). If you want to "test" something and need a model you could
run, for example, a Poisson regression with (logged) population as
offset. If we take your example data and run the Poisson model:


. input region year agegp cases population

region       year      agegp      cases  populat~n
1. 1 1994 4 2 5000
2. 1 1994 9 5 2548
3. 1 1994 14 6 2547
4. 1 1994 19 15 7521
5. 1 1994 24 75 7896
6. end

. gen logpop=log( population)

. list

+-----------------------------------------------------+
region   year   agegp   cases   popula~n     logpop
-----------------------------------------------------
1.       1   1994       4       2       5000   8.517193
2.       1   1994       9       5       2548   7.843064
3.       1   1994      14       6       2547   7.842671
4.       1   1994      19      15       7521   8.925454
5.       1   1994      24      75       7896   8.974112
+-----------------------------------------------------+

. poisson  cases i.agegp, offset( logpop) irr

Iteration 0:   log likelihood =  -19.89126
Iteration 1:   log likelihood = -10.835984
Iteration 2:   log likelihood = -10.235966
Iteration 3:   log likelihood = -10.233162
Iteration 4:   log likelihood = -10.233161

Poisson regression                                Number of obs   =	5
LR chi2(4)      =	84.25
Prob > chi2     =	0.0000
Log likelihood = -10.233161                       Pseudo R2       =	0.8046

	
cases         IRR   Std. Err.      z    P>z     [95% Conf.	Interval]
	
agegp
9     4.905808   4.104493     1.90   0.057     .9517967	25.28581
14      5.88928   4.808577     2.17   0.030     1.188664	29.17866
19     4.986039   3.753353     2.13   0.033     1.140235	21.80303
24     23.74619    17.0135     4.42   0.000     5.830841	96.70675
logpop    (offset)
	

we see, for instance that the incidence rate in the oldest age group
is roughly 24 times as high as in the youngest age group.

However, there still may be better solutions to your problem.

J.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index