Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: different approaches to use only observations that have nonmissingvalues, in survey analysis

From   "Christopher W. Ryan" <>
To   Statalist <>
Subject   st: different approaches to use only observations that have nonmissingvalues, in survey analysis
Date   Mon, 09 Oct 2006 13:37:21 -0400

Using Stata 8 on Win98.

I'm trying to carryout an analysis of the Health Survey for England 2002
data.  I'm primarily interested in the hyperactivity variable among
children, from the Strengths and Difficulties Questionnaire.  That
variable is called sdqhyper.

My subpopulation of interest is kids ages 3-10.  Adults of course all
have missing values on sdqhyper, codes in HSE2002 as some negative
integer (different ones for different types of missing.)

Is it better to recode the missings as Stata's missing value (.) and use
age between 3 and 10 as my subpopulation; or is it better to create a
subpopulation of kids between 3 and 10 who also have no missing values
on sdqhyper?  These approaches seem to give different results.  Here's a
short do file.  Running with -nostop- I think illustrates my dilemma:

use "C:\data\SCHOLAR\ADHD
constipation\UKEpidemiologicalStudies\HSE2002\hse2002thinnedC.dta", clear

* data are already -svyset- with psu(area) strata(stratum)


* meaningful values of sdqhyper are 1 "not true" 2 "somewhat true" 3
"certainly true"
* negative integers indicate various types of missing data:  didn't
answer, not applicable (an adult, for example), etc.  There are no zero
values of sdqhyper.

replace sdqhyper=. if sdqhyper<1

* my subpopulation of interest is kids ages 3-10, inclusive.  myage has
already been defined to be 1 if age is between 3 and 10, inclusive, 0

* notice the singleton PSU error message from the following command
svyprop sdqhyper, subpop(myage)


* now, in constrast to the above approach, I'll try to make my
subpopulation of interest those kids ages 3-10 **who also have
non-missing values on sdqhyper**

gen sdqhypermiss=0
replace sdqhypermiss=1 if sdqhyper<1
gen mysub=0
replace mysub=1 if myage==1 & sdqhypermiss==0

* now this command gives output, not an error message.
svyprop sdqhyper, subpop(mysub)

I could put up a link to a stripped-down version of the datafile, if it
would help.

I guess my underlying question is, how does Stata handle missing values,
versus subpopulations, in a -svy- command?


Christopher W. Ryan, MD
SUNY Upstate Medical University Clinical Campus at Binghamton
and Wilson Family Practice Residency, Johnson City, NY
GnuPG and PGP public keys available at

"If you want to build a ship, don't drum up the men to gather wood,
divide the work and give orders. Instead, teach them to yearn for the
vast and endless sea."  [Antoine de St. Exupery]
*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index