Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: population weights question

 From Kyleigh Schraeder To statalist@hsphsun2.harvard.edu Subject st: population weights question Date Tue, 21 Feb 2012 12:28:14 -0500

```Hi statalisters,

Steven helped me out with a similar question the other day but I would
like to rephrase.

I would like to compare a national health survey dataset to an
independently collected clinic dataset.  The clinic dataset was not
weighted.  I am having some difficulty understanding how to employ the
complex survey variables that were employed for the national dataset.
The design variables for the national dataset are:
xwgtreg (sampling weight variable)
strata (strata variable, 19 strata)
schlid (primary sampling unit/cluster - school id number)

Although national health survey data was collected from a
representative sample of grade 7-12 students (N exceeds 7,000
students), I only want to look at students between the ages 15-19 from
one stratum (stratum 19).  I am also interested in gender differences
between the national dataset and clinic dataset.  I am wondering how
to (and if I need to) -stset- the national health survey data for
analysis.  I have used the following commands:

use nationaldataset
keep if strata==19
keep if Age==16|Age==17|Age==18|Age==19
svyset schlid [pweight=xwgtreg], strata(strata)

**Then I ran tabulate and other descriptive commands. In order to
calculate appropriate variance estimates/confidence intervals, and
p-values, I have created subpopulations for gender analyses.  I read
that this step is necessary when analyzing survey data using compex
survey procedures..?*

gen Subpopmale = .
label variable Subpopmale "Subpopulation1: males 16 to 19 years old"
replace Subpopmale = 0 if (Gender == 2)
replace Subpopmale = 1 if (Gender == 1)
label define Subpopmale 1 "male" 0 "female"
label values Subpopmale Subpopmale

svy, subpop(Subpopmale): tabulate Mother_education
svy, subpop(Subpopmale): tabulate Father_education

**In order to compare the national health survey dataset to the clinic
dataset, I have used the following commands
use clinicdataset
append using nationaldataset
replace samptype = 2 if sample==.
label define samptype 1 "clinic" 2 "national"
label values samptype samptype
svyset schlid [pweight=xwgtreg], strata(strata)

Can I just go ahead and run comparative analyses (relative risk
ratios) on this combined dataset? I don't quite understand if the
population weights are being accounted for by the national dataset. .