Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# st: population weights question

 From Kyleigh Schraeder To statalist@hsphsun2.harvard.edu Subject st: population weights question Date Tue, 21 Feb 2012 16:17:36 -0500

```Hi statalisters,

I would like to compare a national health survey dataset to an
independently collected clinic dataset.  The clinic dataset was not
weighted.  I am having some difficulty understanding how to employ the
complex survey variables that were employed for the national dataset.
The design variables for the national dataset are: xwgtreg (sampling
weight variable), strata (strata variable, 19 strata), schlid (primary
sampling unit/cluster - school id number).

Although national health survey data was collected from a
representative sample of grade 7-12 students (N exceeds 7,000
students), I only want to look at students between the ages 15-19 from
one stratum (stratum 19).  I am also interested in gender differences
between the national dataset and clinic dataset.  I am wondering how
to (and if I need to) -stset- the national health survey data for
analysis.  I have used the following commands:

use nationaldataset
keep if strata==19
keep if Age==16|Age==17|Age==18|Age==19
svyset schlid [pweight=xwgtreg], strata(strata)

**Then I ran tabulate and other descriptive commands. In order to
calculate appropriate variance estimates/confidence intervals, and
p-values, I have created subpopulations for gender analyses.  I read
that this step is necessary when analyzing survey data using compex
survey procedures..? Is this necessary?

gen Subpopmale = .
label variable Subpopmale "Subpopulation1: males 16 to 19 years old"
replace Subpopmale = 0 if (Gender == 2)
replace Subpopmale = 1 if (Gender == 1)
label define Subpopmale 1 "male" 0 "female"
label values Subpopmale Subpopmale

*For example
svy, subpop(Subpopmale): tabulate Mother_education

**In order to compare the national health survey dataset to the clinic
dataset, I have used the following commands
use clinicdataset
append using nationaldataset
replace samptype = 2 if sample==.
label define samptype 1 "clinic" 2 "national"
label values samptype samptype
svyset schlid [pweight=xwgtreg], strata(strata)

I'm having some trouble running relative risk ratios on the combined
dataset. I don't quite understand if the
population weights are being accounted for by the national dataset. .