st: problems replicating DSTDIZE

 From "Daniel Exeter" To Subject st: problems replicating DSTDIZE Date Tue, 30 May 2006 12:37:36 +1200

```Hi folks,

I'm using stata v8 at the moment, and am having a problem when I
manually replicate the output of the 'dstdize' command.

When I run the code (at the bottom of the email), my results are correct
for all ages, but I run into problems in the calculation of the adj_rate
variable when I repeat the analysis for the population aged 0-64.

Example 1: All Ages

Summary of Study Populations: (output for dstdize, for all ages)

state             N      Crude     Adj_Rate       Confidence Interval

------------------------------------------------------------------------
--
1       1010740   0.028408     0.018579    [  0.018378,
0.018780]
2       1014144   0.030460     0.021090    [  0.020867,
0.021314]
3       1012676   0.035128     0.023172    [  0.022940,
0.023405]
4       1012054   0.039290     0.026232    [  0.025980,
0.026484]
5       1012397   0.038515     0.031786    [  0.031487,
0.032084]

Summary of output using manual calculations

+---------------------------------------------------------------+
| state         N      crude   adj_rate         lb         ub |
|---------------------------------------------------------------|
1. |       1   1010740   .0284079   .0185789   .0183779     .01878 |
2. |       2   1014144   .0304602   .0210904   .0208671   .0213137 |
3. |       3   1012676   .0351277   .0231724   .0229398    .023405 |
4. |       4   1012054   .0392904   .0262319     .02598   .0264838 |
5. |       5   1012397   .0385145   .0317857   .0314869   .0320844 |
+---------------------------------------------------------------+

Example 2: Premature mortality 0-64 years inclusive

DSTDIZE OUTPUT
Summary of Study Populations:
state             N      Crude     Adj_Rate       Confidence Interval

------------------------------------------------------------------------
--
1        852842   0.004972     0.004496    [  0.004360,
0.004633]
2        859488   0.006272     0.006011    [  0.005851,
0.006171]
3        843940   0.007608     0.007257    [  0.007080,
0.007433]
4        835061   0.009727     0.009414    [  0.009211,
0.009618]
5        865780   0.012857     0.013308    [  0.013064,
0.013552]

MANUAL CALCULATION OUTPUT
+--------------------------------------------------------------+
| state        N      crude   adj_rate         lb         ub |
|--------------------------------------------------------------|
1. |       1   852842   .0049716   .0040019   .0038807   .0041231 |
2. |       2   859488   .0062723   .0053499   .0052074   .0054924 |
3. |       3   843940   .0076084   .0064583   .0063011   .0066156 |
4. |       4   835061   .0097274   .0083787   .0081976   .0085597 |
5. |       5   865780   .0128566   .0118441   .0116268   .0120614 |
+--------------------------------------------------------------+

Could anybody suggest why this might be the case? The only difference in
the calculation for the premature mortality version is that I have an if
statement which drops records if age_group >=14 ( representing 65years
+)

I have included the code below
Cheers
Dan

Statamodel.do

capture log close
capture log using "test.log", text replace
foreach a in all prem {
** prepare the death data
use mortality, clear
if "`a'" == "prem" {
drop if(age_group>=14)
}
save tmp_mort, replace

** prepare the standard population

use std_pop, clear
sort age_group
if "`a'" == "prem" {
drop if(age_group>=14)
}
list
save tmp_pop, replace
** run the stata command for direct standardisation
use tmp_mort, clear
dstdize deaths pop age_group, by(state) using (tmp_pop)
* now do the standardisation programmatically
use tmp_mort, clear
qui {
egen age = group(age_group)
egen N = sum(population), by(state)
egen cases = sum(deaths), by(state)
gen popdist = population/N
gen loc_crude = deaths/population
rename population loc_pop
sort age_group
merge age_group using tmp_pop
gen sum_n = sum(population)

* generate crude rate

by state, sort: gen crude = (cases/N)

gen product = loc_crude* population

by state, sort: egen adj_rate = sum(product)

generate prod2 = (population^2*loc_crude*(1-loc_crude)/loc_pop)
by state: egen se = sum(prod2)
replace se = sqrt(se)

* and the Upper and lower CI's
gen ub = adj_rate + invnorm(0.975)*se
gen lb = adj_rate - invnorm(0.975)*se

drop _merge
keep if age ==1
}
list state N crude1 adj_rate lb ub
}

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```