# Re: st: creating sample weights: Corrected

 From Steven Joel Hirsch Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: creating sample weights: Corrected Date Fri, 10 Aug 2007 14:35:30 -0400

```I mistakenly left in unnecessary text.
-Steve
On Aug 10, 2007, at 10:53 AM, Janelle Knox wrote:

```
```I am trying to create a sample weight for a dataset, which will
correct for variations in gender, age, etc from population means.
Does anyone know how to do this, or where I can find information for
setting up a sample weight.... pweight=?

Thanks,
Jane
```
Jane, you cannot do it to match population means. However you can do it to match population percentages in different categories. This technique for this is known as "raking". In Stata this is available in Nick Winter's program -survwgt rake-. Type "ssc install survwgt". By the way, "pweight" is a stata reserved word.

A good reference for practice is: http://www.abtassociates.com/ presentations/raking_survey_data_2_JOS.pdf

Warning: If you are not experienced with weighting, you can run into many problems. Raking will not fix, and might even worsen, certain kinds of sample deficiencies. If you have followed recent discussions on Statalist, you will be aware that not everyone recommends weighting before doing regressions.

You don't say if there is an existing "design weight". If so, I assume that it's name is "old_wt". Otherwise, define "old_wt=1" before running the survwgt program.

1. Create grouped versions of the variables you wish to match in your original data set.

2. Now create separate data sets for each characteristic that you wish to match, these will contain the adjusted totals for each characteristic Suppose your sample size is n=1,252. Below is an example for creating a data set "agedat.dta" which contains the age group totals.

3. Merge these into your original data. .

4. The rake instructions are then (for example):

survwgt rake old_wt, by(race gender age_gp) totvars(race_tot gender_tot age_tot) generate(new_wt)

5. "new_wt" is your new weight variable. It will probably contain fractions, but these will not affect the regressions.

-Steve

/*CREATE AGE DATA SET WITH ADJUSTED TOTALS SO THAT SAMPLE & POP PERCENTS MATCH */
local ssize=1252
clear

/* Gender Data Set: 1 10% 2 20% 3 50% 4 20% */
input age_gp pop_pct
1 .1
2 .2
3 .5
4 .2
end
gen age_tot=`ssize'*pop_pct
list
table age_gp , c(sum age_tot) row
sort age_tot
save age_dat, replace

/*****************CODE ENDS ***************************/

Steven Joel Hirsch Samuels

18 Cantine's Island
Saugerties, NY 12477
Phone: 845-246-0774
EFax: 208-498-7441

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/