Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: How to svyset for the SIPP

From   Laurel Copeland <>
Subject   st: RE: How to svyset for the SIPP
Date   Fri, 11 Feb 2005 07:22:13 -0600

I do not know anything about SIPP and I hope someone on this list can
provide you with a direct answer to your question, however, when in a
similar situation with regards to NHSDA data, I put into the Google search
engine the terms,
 NHSDA 1997 Stata
And scoured the internet until I found a journal article where the authors
had used Stata to analyse 1997 NHSDA data (the ds I wanted to use). Then I
emailed the authors and one of them wrote back with the information I

This approach might work with SIPP.  For example, backing off the path below
to the US Census Bureau home page (top level of path), I used the Google
engine there to look for 
 Stata SIPP
And one hit is a slide show with 2 email contacts on the last slide

I am not saying those 2 people will give you what you need, but the approach
may be useful.
Good luck,
Laurel Copeland
San Antonio VA  

-----Original Message-----
From: Stephen Mennemeyer [] 
Sent: Thursday, February 10, 2005 5:17 PM
Subject: st: How to svyset for the SIPP

Dear Statalisters:

Can anyone give me some guidance on how to analyze the Survey of Income and
Program Participation ( SIPP), especially the 1996 version, to take account
of the complex sample design?  I have access to both SAS/SUDAAN and Stata.

Assuming for the moment that I use the svy commands in Stata and that I want
to do longitudinal analysis, I think I want to use the svyset command  as

svyset wpfinwgt, strata(gvarstr)

where wpfinwgt is the longitudinal weight for individuals and gvarstr is the
"variance stratum code".

I  am confused about whether I can or should do anything with the options
for PSU and FPC.

According to the SIPP Manual  page 8-1:

"The 1996 Panel of the SIPP sample is located in 322 Primary Sampling Units
(PSUs), each consisting of a county or a group of contiguous counties.
Within these PSUs, living quarters
(LQs) were systematically selected from lists of addresses...."

As far as I can tell, there is no SIPP variable for the PSU. The  PSU code
is scrambled inside the ssuid variable (the household ID number) but I do
not think there is any way to tell which ssuids came from  the same PSU.

However, from reading the Stata 8 manual  at U [30] p. 346-347 I wonder if I
should use the command:

svyset wpfinwgt, strata(gvarstr) psu(ssuid) fpc(epppnum)

where epppnum is the individual person identifier within the household.

I think this is wrong but my logic here is that the SIPP is sampling
individuals who are "clustered"  in households where every member of the
household is interviewed. I am particularly concerned about the remark in
the Stata Manual U 30.2.2 p. 347 "For example if our PSUs were were
households and we included every member of the household in our study, then
a finite population correction term would be appropriate where the
households are sampled using simple random sampling without replacement in
each stratum"

Guidance would be much appreciated.

Stephen T. Mennemeyer Ph.D.
University of Alabama at Birmingham
School of Public Health
Dept. of Health Care Organization and Policy

U.S. Mail:
1530 3rd Ave. South 330 RPHB
Birmingham, Al 35294-0022

Express Delivery:
330 Ryals Public Health Building
1665 University Blvd.
Birmingham, Al  35294-0022

Phone: (205) 975-8965
FAX (205) 934-3347

*   For searches and help try:
*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index