-----Oprindelig meddelelse-----
Fra: [email protected]
[mailto:[email protected]] På vegne af Kristian  
Wraae
Sendt: Tuesday, December 09, 2008 1:23 PM
Til: [email protected]
Emne: SV: SV: st: Survey - raking - calibration - post  
stratification -
calculating weights
Now I have continued to step 2 with this do file:
*Step 2
xi: logistic sample i.age_grp i.geo_grp  i.health_medication
i.health_diseases
predict p_r
gen weight3x = weight2x * (1/p_r)
keep if sample == 1
				*(reducing dataset to 600 men)
survwgt rake  weight3x,   ///
        by(age_grp  geo_grp) ///
        totvars(tot_age_grp tot_geo_grp) ///
        gen(weight4x)
The problem now is that Stata says that "totals across dimensions 1  
and 2
are not equal"
Why is that? Should I generate new totals for tot_age_grp and  
tot_geo_grp?
Should they be based on the 3743 Why?
How do I deal with missing values in p_r (depending on which  
predictors I
include in the logistisk regression I might get missing values for  
p_r).
-----Oprindelig meddelelse-----
Fra: [email protected]
[mailto:[email protected]] På vegne af Kristian  
Wraae
Sendt: Tuesday, December 09, 2008 12:35 PM
Til: [email protected]
Emne: SV: SV: st: Survey - raking - calibration - post  
stratification -
calculating weights
I have now tried to do the first step of the raking.
I have 15 age groups and 67 geographic groups (simply based on the zip
codes).
I tried to do the raking first with a smaller number of geographic  
groups
(10) but the results were more accurate with all groups.
The variable I have are:
age = continuos variable containg the age of the subject at the  
time of
sampling dist_study = continuous variable containing the distance  
from the
individual to me. age_grp = categorial variable - 15 age strata.  
geo_grp =
zip code quest = 1 if individual returned a filled out  
questionnaire pop = 1
if individual was amongst the 4975 in the original sample (all had  
of course
pop=1) sample = 1 for each finally included subject.
The do file looks like this:
*************
*To get data from the orginal population
tabstat age
tabstat dist_study
*Raking starts by generating totals in each age group and  
geographical group
egen tot_age_grp =  count(pop),by(age_grp) egen tot_age_grp_q =  
count(pop)
if quest==1, by(age_grp)
egen tot_geo_grp =  count(pop),by(geo_grp)
egen tot_geo_grp_q = count(pop) if quest==1, by(geo_grp) *Inital  
weight is
generated gen weight1x = (tot_age_grp / tot_age_grp_q)
keep if quest==1
			*(reducing the dataset to 3743 men)
survwgt rake  weight1x,   ///
        by(age_grp  geo_grp) ///
        totvars(tot_age_grp tot_geo_grp) ///
        gen(weight2x)
svyset  [pweight=weight2x], strata(age_grp)
*Description
svydes
*Now we estimate the average age in the 4975 men from the 3743 men  
svymean
age *Now we estimate the average distance to travel to get to me  
for the
4975 men based on the 3743 men svymean  dist_study
*These are the actual numbers for the 3743 men.
tabstat age
tabstat dist_study
******************
The output from Stat8 is:
. *************
. tabstat age
    variable |      mean
-------------+----------
         age |   66.6695
------------------------
. tabstat dist_study
    variable |      mean
-------------+----------
  dist_study |  25.90153
------------------------
.
.
. egen tot_age_grp =  count(pop),by(age_grp)
. egen tot_age_grp_q = count(pop) if quest==1, by(age_grp) (1232  
missing
values generated)
.
. egen tot_geo_grp =  count(pop),by(geo_grp)
. egen tot_geo_grp_q = count(pop) if quest==1, by(geo_grp) (1232  
missing
values generated)
.
. gen weight1x = (tot_age_grp / tot_age_grp_q)
(1232 missing values generated)
.
. keep if quest==1
(1232 observations deleted)
.                         *(reducing the dataset to 3743 men)
. survwgt rake  weight1x,   ///
        by(age_grp  geo_grp) ///
        totvars(tot_age_grp tot_geo_grp) ///
        gen(weight2x)
.
. svyset  [pweight=weight2x], strata(age_grp)
pweight is weight2x
strata is age_grp
.
. svydes
pweight:  weight2x
Strata:   age_grp
PSU:      <observations>
                                      #Obs per PSU
 Strata                       ----------------------------
 age_grp    #PSUs     #Obs       min      mean       max
--------  --------  --------  --------  --------  --------
       1       346       346         1       1.0         1
       2       333       333         1       1.0         1
       3       304       304         1       1.0         1
       4       297       297         1       1.0         1
       5       284       284         1       1.0         1
       6       275       275         1       1.0         1
       7       249       249         1       1.0         1
       8       246       246         1       1.0         1
       9       231       231         1       1.0         1
      10       209       209         1       1.0         1
      11       212       212         1       1.0         1
      12       210       210         1       1.0         1
      13       184       184         1       1.0         1
      14       174       174         1       1.0         1
      15       189       189         1       1.0         1
--------  --------  --------  --------  --------  --------
      15      3743      3743         1       1.0         1
.
. svymean  age
Survey mean estimation
pweight:  weight2x                                Number of obs    =
3743
Strata:   age_grp                                 Number of strata =
15
PSU:      <observations>                          Number of PSUs   =
3743
                                                  Population size   
= 4975
---------------------------------------------------------------------- 
------
--
    Mean |   Estimate    Std. Err.   [95% Conf. Interval]        Deff
--------- 
+--------------------------------------------------------------
---------+----
--
     age |   66.66605    .0067455    66.65283    66.67928    .0092211
---------------------------------------------------------------------- 
------
--
. svymean  dist_study
Survey mean estimation
pweight:  weight2x                                Number of obs    =
3742
Strata:   age_grp                                 Number of strata =
15
PSU:      <observations>                          Number of PSUs   =
3742
                                                  Population size  =
4973.7235
---------------------------------------------------------------------- 
------
--
    Mean |   Estimate    Std. Err.   [95% Conf. Interval]        Deff
--------- 
+--------------------------------------------------------------
---------+----
--
dist_s~y |   25.90772    .3139459     25.2922    26.52325     1.01731
---------------------------------------------------------------------- 
------
--
.
. tabstat age
    variable |      mean
-------------+----------
         age |   66.5895
------------------------
. tabstat dist_study
    variable |      mean
-------------+----------
  dist_study |  25.93867
------------------------
.
end of do-file
As one can see the average age amongst the 4975 men is: 66.6695
Using raking and svymean Stata estimates the average age amongst  
the 4975
men based on the information from the 3743 men to be: 66.66605
As one can see those are quite similar.
Now let us look at the distance to travel. We raked on zip codes  
which are
not equivalent to distances but despite that the results are quite  
amazing:
We know the average distance to travel is: 25.90153 km
After raking and basing the results on the 3743 men Stata estimates  
the
distance to be: 25.90772 km
Strikingly similar. The true distributions amongst the 3743 are not as
close: 66.5895 years and 25.93867 kms, but really not that far off.
The differences will be far greater when raking the 600.
I will now go on.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/