[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: weight

From   Steven Samuels <>
Subject   Re: st: weight
Date   Tue, 3 Jul 2007 17:16:57 -0400

On Jul 3, 2007, at 1:27 PM, Sebastian Kruk wrote:
I have a Stata database and a Excell spreadsheet.
In the first one,  I have a survey microdata of households (v.g.
income, quality of life, poverty, education, employment, housing and
In the second one, I have a excell archive with population
projections. I will like to use it to compute a weight for a data in
How I form groups by ages and then I use a weight by group of age?
What you want to do is called "poststratification."

1. Create age group categories for your data to match categories in your spreadsheet.

Suppose the spreadsheet categories and corresponding population projection counts are:

20-29 140,000
30-39 120,000
40-49 300,000
50-59 200,000
60-69 100,000
70-100 50,000

In your Stata data set, create a grouped age variable with the same categories:

egen agegp=cut(age), at(20,30,40,50,60,70)
// Creates a grouped variable & names the categories for the LH endpoints

label define agegp 20 "20-29" 30 "30-39" 40 "40-49" 50 "50-59" 60 "60-69" 70 "70+"
label values agegp agegp

2. Create a variable in Stata to hold the projection counts. You can do this by hand or you can add an "agegp" column to the spreadsheet and use -insheet- to bring them into a stata data set, then -merge- with your data by agegp

By hand:
gen age_ct=140000 if agegp==20
replace age_ct=120000 if agegp==30
replace age_ct=300000 if agegp==40
replace age_ct=50000 if agegp==70

3. Finally, -svyset- your data. As Friedrich stated, there may already be a "weight" variable. There may also be a stratum variable, and primary sampling unit variable in your microsurvey. These should be documented in the survey codebook.
I assume you have all three, named "final_wt","stratum", "psu".

The -svyset- command is:

"svyset psu [pweight=finalt_wt], strata(stratum) poststrata(agegp) postweight(age_ct)"

4. If you you have separate population counts on age and gender, then you might create a combination age-gender variable and post-stratify on this 12 category variable. Be aware that if you have few people in some categories you may need to combine them.

For example:
gen age_gender = 21 if agegp==20 & gender==1
replace age_gender = 22 if agegp==20 & gender==2


* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index