[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: creating sample weights: Corrected

From   Steven Joel Hirsch Samuels <>
Subject   Re: st: creating sample weights: Corrected
Date   Fri, 10 Aug 2007 14:35:30 -0400

I mistakenly left in unnecessary text.
On Aug 10, 2007, at 10:53 AM, Janelle Knox wrote:

I am trying to create a sample weight for a dataset, which will
correct for variations in gender, age, etc from population means.
Does anyone know how to do this, or where I can find information for
setting up a sample weight.... pweight=?

Jane, you cannot do it to match population means. However you can do it to match population percentages in different categories. This technique for this is known as "raking". In Stata this is available in Nick Winter's program -survwgt rake-. Type "ssc install survwgt". By the way, "pweight" is a stata reserved word.

A good reference for practice is: presentations/raking_survey_data_2_JOS.pdf

Warning: If you are not experienced with weighting, you can run into many problems. Raking will not fix, and might even worsen, certain kinds of sample deficiencies. If you have followed recent discussions on Statalist, you will be aware that not everyone recommends weighting before doing regressions.

You don't say if there is an existing "design weight". If so, I assume that it's name is "old_wt". Otherwise, define "old_wt=1" before running the survwgt program.

1. Create grouped versions of the variables you wish to match in your original data set.

2. Now create separate data sets for each characteristic that you wish to match, these will contain the adjusted totals for each characteristic Suppose your sample size is n=1,252. Below is an example for creating a data set "agedat.dta" which contains the age group totals.

3. Merge these into your original data. .

4. The rake instructions are then (for example):

survwgt rake old_wt, by(race gender age_gp) totvars(race_tot gender_tot age_tot) generate(new_wt)

5. "new_wt" is your new weight variable. It will probably contain fractions, but these will not affect the regressions.


local ssize=1252

/* Gender Data Set: 1 10% 2 20% 3 50% 4 20% */
input age_gp pop_pct
1 .1
2 .2
3 .5
4 .2
gen age_tot=`ssize'*pop_pct
table age_gp , c(sum age_tot) row
sort age_tot
save age_dat, replace

/*****************CODE ENDS ***************************/

Steven Joel Hirsch Samuels
18 Cantine's Island
Saugerties, NY 12477
Phone: 845-246-0774
EFax: 208-498-7441

* For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index