[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: NHSDA data, accounting for the sampling design

From	"Copeland, Laurel" <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	st: NHSDA data, accounting for the sampling design
Date	Tue, 26 Aug 2003 10:01:20 -0700

I am working with the 1997 NHSDA dataset (funded by SAMDHA) from UM's ICPSR
archives (choose the 1997 link from
http://www.icpsr.umich.edu:8080/ICPSR-SERIES/00064.xml ; codebook available
as PDF from
http://www.icpsr.umich.edu/cgi/archive.prl?study=2755&path=SAMHDA ). The
survey dataset is produced by RTI. RTI uses their SUDAAN software to analyze
it, taking into account the complex sampling design. Briefly, this consisted
of first-stage determination of 43 certainty PSUs stratified into 5
race-related strata, plus a large uncertainty stratum for the remainder of
the US from which noncertainty PSUs were selected; second-stage segment
sampling within PSUs; and third-stage unit-listing.  Each respondent ends up
with a sampling weight (ANALWT) and two variables representing the nested
strata (VESTR and VEREP).

An example in the codebook shows this SUDAAN code where the first 3 lines
seem to specify the design:
PROC DESCRIPT DATA = "D:\NHSDA97" FILETYPE=SAS DESIGN=WR;
 NEST VESTR VEREP;
 WEIGHT ANALWT;
   VAR MRJFLAG;
    SUBGROUP CATAGE SEX RACE;
    LEVELS 4 2 4;
    TABLES CATAGE*(SEX RACE);
    SETENV DECWIDTH=6 COLWIDTH=17;
    PRINT NSUM WSUM MEAN SEMEAN SETOTAL/
      NSUMFMT=F8.0 WSUMFMT=F12.0 MEANFMT=F15.10
      SEMEANFMT=F15.10 SETOTALFMT=12.0
    OUTPUT NSUM WSUM MEAN SEMEAN SETOTAL/
      NSUMFMT=F8.0 WSUMFMT=F12.0 MEANFMT=F15.10
      SEMEANFMT=F15.10 SETOTALFMT=F12.0;
There is an accompanying description as follows: "For use with software such
as SUDAAN, two variables were created: VESTR and VEREP. The sampling design
used to select the NHSDA results in a deeply stratified sample. Therefore,
adjacent strata are collapsed into pairs to create pseudo-strata (VESTR)
with two replicates each (VEREP). For all noncertainty strata, the PSU's
(each of which represents an implicit stratum) are grouped into pairs based
on their sequential order of selection. Each pair of PSUs defines a
pseudo-stratum (VESTR) with two replicates (VEREP). For the certainty
portion of the sample, segments represent the first stage of sampling. Each
explicit design stratum is partitioned into groups of approximately 24
segments based on order of selection (e.g., about the size of a
non-certainty pseudo-stratum). These sets of approximately 24 segments
define pseudo-strata (VESTR) for analysis purposes. The segments are then
paired in selection order within each certainty pseudo-stratum. One segment
from each pair is randomly assigned to replicate 1 and the other segment to
replicate 2 (VEREP)."


I have never used SUDAAN (nor do I have access to it), I have not needed the
few SAS SURVEY commands available in regular SAS, and I am generally
unfamiliar with setting up Stata (-svyset-) to handle this design.

I found some publications on the web that analyzed the NHSDA data with
Stata, so I was encouraged by that. I got some information from a SAMDHA
research associate at ICPSR, but that person was uncertain of how to set up
Stata for this dataset.

I have found that if I use the -svyset- and -svymean- in Stata I can get:
 svyset pweight ANALWT
 svyset strata VESTR
 svyset psu VEREP
 sum IRAGE

to replicate the output I get from running SAS SURVEYMEANS:
 PROC SURVEYMEANS DATA = NHS2.NHS97;
  CLUSTER VEREP;
  STRATA VESTR;
  WEIGHT ANALWT;
   VAR irage;
     WHERE MDESFS3>. AND snufever>.;*to match my subsetted ds in Stata;
 RUN;

Based on this, the SAMDHA res. associate wrote:
"Laurel these specifications appear to be correct; I think it's safe to 
 assume your stata settings are solid. According to my documents, Sas 
 commands map to the following stata commands:
	Stratum == svyset strata
	Cluster == svyset psu
	weight == svyset pweight  "

Can anyone confirm this or offer me more certain translation of the NHSDA
design into Stata parameters?

Thank you,
Laurel

Laurel A Copeland, PhD
VA Ann Arbor Health System
(734) 769-7100 x6206
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: RE: NHSDA data, accounting for the sampling design
  - From: "Nick Cox" <[email protected]>

Prev by Date: RE: st: newbie- syntax question
Next by Date: Re: st: stat-transfer updates
Previous by thread: st: NetCourses 101 and 151
Next by thread: st: RE: NHSDA data, accounting for the sampling design
Index(es):
- Date
- Thread