[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: problems replicating DSTDIZE

From	"Daniel Exeter" <[email protected]>
To	<[email protected]>
Subject	st: problems replicating DSTDIZE
Date	Tue, 30 May 2006 12:37:36 +1200

Hi folks, 

I'm using stata v8 at the moment, and am having a problem when I
manually replicate the output of the 'dstdize' command. 

When I run the code (at the bottom of the email), my results are correct
for all ages, but I run into problems in the calculation of the adj_rate
variable when I repeat the analysis for the population aged 0-64. 

Example 1: All Ages 


Summary of Study Populations: (output for dstdize, for all ages)

  state             N      Crude     Adj_Rate       Confidence Interval
 
------------------------------------------------------------------------
--
        1       1010740   0.028408     0.018579    [  0.018378,
0.018780]
        2       1014144   0.030460     0.021090    [  0.020867,
0.021314]
        3       1012676   0.035128     0.023172    [  0.022940,
0.023405]
        4       1012054   0.039290     0.026232    [  0.025980,
0.026484]
        5       1012397   0.038515     0.031786    [  0.031487,
0.032084]

Summary of output using manual calculations

     +---------------------------------------------------------------+
     | state         N      crude   adj_rate         lb         ub |
     |---------------------------------------------------------------|
  1. |       1   1010740   .0284079   .0185789   .0183779     .01878 |
  2. |       2   1014144   .0304602   .0210904   .0208671   .0213137 |
  3. |       3   1012676   .0351277   .0231724   .0229398    .023405 |
  4. |       4   1012054   .0392904   .0262319     .02598   .0264838 |
  5. |       5   1012397   .0385145   .0317857   .0314869   .0320844 |
     +---------------------------------------------------------------+


Example 2: Premature mortality 0-64 years inclusive

DSTDIZE OUTPUT 
Summary of Study Populations:
  state             N      Crude     Adj_Rate       Confidence Interval
 
------------------------------------------------------------------------
--
        1        852842   0.004972     0.004496    [  0.004360,
0.004633]
        2        859488   0.006272     0.006011    [  0.005851,
0.006171]
        3        843940   0.007608     0.007257    [  0.007080,
0.007433]
        4        835061   0.009727     0.009414    [  0.009211,
0.009618]
        5        865780   0.012857     0.013308    [  0.013064,
0.013552]

MANUAL CALCULATION OUTPUT 
     +--------------------------------------------------------------+
     | state        N      crude   adj_rate         lb         ub |
     |--------------------------------------------------------------|
  1. |       1   852842   .0049716   .0040019   .0038807   .0041231 |
  2. |       2   859488   .0062723   .0053499   .0052074   .0054924 |
  3. |       3   843940   .0076084   .0064583   .0063011   .0066156 |
  4. |       4   835061   .0097274   .0083787   .0081976   .0085597 |
  5. |       5   865780   .0128566   .0118441   .0116268   .0120614 |
     +--------------------------------------------------------------+

Could anybody suggest why this might be the case? The only difference in
the calculation for the premature mortality version is that I have an if
statement which drops records if age_group >=14 ( representing 65years
+)

I have included the code below
Cheers
Dan


Statamodel.do

capture log close
capture log using "test.log", text replace
foreach a in all prem {
** prepare the death data
use mortality, clear
 if "`a'" == "prem" {
 drop if(age_group>=14)
}
 save tmp_mort, replace

** prepare the standard population 

use std_pop, clear
sort age_group
 if "`a'" == "prem" {
 drop if(age_group>=14)
}
list
save tmp_pop, replace
** run the stata command for direct standardisation
use tmp_mort, clear
dstdize deaths pop age_group, by(state) using (tmp_pop)
* now do the standardisation programmatically
use tmp_mort, clear
qui {
egen age = group(age_group)
egen N = sum(population), by(state)
egen cases = sum(deaths), by(state)
gen popdist = population/N
gen loc_crude = deaths/population
rename population loc_pop
sort age_group
merge age_group using tmp_pop
gen sum_n = sum(population)

* generate crude rate 

by state, sort: gen crude = (cases/N)

* and the adjusted rate

gen product = loc_crude* population

by state, sort: egen adj_rate = sum(product)

generate prod2 = (population^2*loc_crude*(1-loc_crude)/loc_pop)
by state: egen se = sum(prod2)
replace se = sqrt(se)

* and the Upper and lower CI's
gen ub = adj_rate + invnorm(0.975)*se
gen lb = adj_rate - invnorm(0.975)*se

drop _merge
keep if age ==1 
}
list state N crude1 adj_rate lb ub
}


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: RE: xtline only for datasets with more than x values
Next by Date: st: Recurrent Event and Terminating Event
Previous by thread: st: xtline only for datasets with more than x values
Next by thread: st: Negative Hausman Test and FE vs. Pooled
Index(es):
- Date
- Thread