Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SV: SV: st: Survey - raking - calibration - post stratification - calculating weights


From   "Kristian Wraae" <Kristian_Wraae@vip.cybercity.dk>
To   <statalist@hsphsun2.harvard.edu>
Subject   SV: SV: st: Survey - raking - calibration - post stratification - calculating weights
Date   Tue, 9 Dec 2008 12:34:59 +0100

I have now tried to do the first step of the raking.

I have 15 age groups and 67 geographic groups (simply based on the zip
codes).

I tried to do the raking first with a smaller number of geographic groups
(10) but the results were more accurate with all groups.

The variable I have are:
age = continuos variable containg the age of the subject at the time of
sampling
dist_study = continuous variable containing the distance from the individual
to me.
age_grp = categorial variable - 15 age strata.
geo_grp = zip code
quest = 1 if individual returned a filled out questionnaire
pop = 1 if individual was amongst the 4975 in the original sample (all had
of course pop=1)
sample = 1 for each finally included subject.
 
The do file looks like this:

*************
*To get data from the orginal population
tabstat age
tabstat dist_study

*Raking starts by generating totals in each age group and geographical group
egen tot_age_grp =  count(pop),by(age_grp)
egen tot_age_grp_q = count(pop) if quest==1, by(age_grp)

egen tot_geo_grp =  count(pop),by(geo_grp)
egen tot_geo_grp_q = count(pop) if quest==1, by(geo_grp)
*Inital weight is generated
gen weight1x = (tot_age_grp / tot_age_grp_q)

keep if quest==1 
			*(reducing the dataset to 3743 men)
survwgt rake  weight1x,   ///
        by(age_grp  geo_grp) ///
        totvars(tot_age_grp tot_geo_grp) ///
        gen(weight2x)

svyset  [pweight=weight2x], strata(age_grp)

*Description
svydes 
*Now we estimate the average age in the 4975 men from the 3743 men
svymean  age
*Now we estimate the average distance to travel to get to me for the 4975
men based on the 3743 men
svymean  dist_study

*These are the actual numbers for the 3743 men.
tabstat age
tabstat dist_study
******************

The output from Stat8 is:

. *************
. tabstat age

    variable |      mean
-------------+----------
         age |   66.6695
------------------------

. tabstat dist_study

    variable |      mean
-------------+----------
  dist_study |  25.90153
------------------------

. 
. 
. egen tot_age_grp =  count(pop),by(age_grp)

. egen tot_age_grp_q = count(pop) if quest==1, by(age_grp)
(1232 missing values generated)

. 
. egen tot_geo_grp =  count(pop),by(geo_grp)

. egen tot_geo_grp_q = count(pop) if quest==1, by(geo_grp)
(1232 missing values generated)

. 
. gen weight1x = (tot_age_grp / tot_age_grp_q)
(1232 missing values generated)

. 
. keep if quest==1 
(1232 observations deleted)

.                         *(reducing the dataset to 3743 men)
. survwgt rake  weight1x,   ///
>         by(age_grp  geo_grp) ///
>         totvars(tot_age_grp tot_geo_grp) ///
>         gen(weight2x)

. 
. svyset  [pweight=weight2x], strata(age_grp)
pweight is weight2x
strata is age_grp

. 
. svydes 

pweight:  weight2x
Strata:   age_grp
PSU:      <observations>
                                      #Obs per PSU
 Strata                       ----------------------------
 age_grp    #PSUs     #Obs       min      mean       max
--------  --------  --------  --------  --------  --------
       1       346       346         1       1.0         1
       2       333       333         1       1.0         1
       3       304       304         1       1.0         1
       4       297       297         1       1.0         1
       5       284       284         1       1.0         1
       6       275       275         1       1.0         1
       7       249       249         1       1.0         1
       8       246       246         1       1.0         1
       9       231       231         1       1.0         1
      10       209       209         1       1.0         1
      11       212       212         1       1.0         1
      12       210       210         1       1.0         1
      13       184       184         1       1.0         1
      14       174       174         1       1.0         1
      15       189       189         1       1.0         1
--------  --------  --------  --------  --------  --------
      15      3743      3743         1       1.0         1

. 
. svymean  age

Survey mean estimation

pweight:  weight2x                                Number of obs    =
3743
Strata:   age_grp                                 Number of strata =
15
PSU:      <observations>                          Number of PSUs   =
3743
                                                  Population size  =
4975

----------------------------------------------------------------------------
--
    Mean |   Estimate    Std. Err.   [95% Conf. Interval]        Deff
---------+------------------------------------------------------------------
--
     age |   66.66605    .0067455    66.65283    66.67928    .0092211
----------------------------------------------------------------------------
--

. svymean  dist_study

Survey mean estimation

pweight:  weight2x                                Number of obs    =
3742
Strata:   age_grp                                 Number of strata =
15
PSU:      <observations>                          Number of PSUs   =
3742
                                                  Population size  =
4973.7235

----------------------------------------------------------------------------
--
    Mean |   Estimate    Std. Err.   [95% Conf. Interval]        Deff
---------+------------------------------------------------------------------
--
dist_s~y |   25.90772    .3139459     25.2922    26.52325     1.01731
----------------------------------------------------------------------------
--

. 
. tabstat age

    variable |      mean
-------------+----------
         age |   66.5895
------------------------

. tabstat dist_study

    variable |      mean
-------------+----------
  dist_study |  25.93867
------------------------

. 
end of do-file

As one can see the average age amongst the 4975 men is: 66.6695

Using raking and svymean Stata estimates the average age amongst the 4975
men based on the information from the 3743 men to be: 66.66605

As one can see those are quite similar.

Now let us look at the distance to travel. We raked on zip codes which are
not equivalent to distances but despite that the results are quite amazing:

We know the average distance to travel is: 25.90153 km

After raking and basing the results on the 3743 men Stata estimates the
distance to be: 25.90772 km

Strikingly similar. The true distributions amongst the 3743 are not as
close: 66.5895 years and 25.93867 kms, but really not that far off.

The differences will be far greater when raking the 600.

I will now go on.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index