Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Missings in Survey Design (was: Bootstrap and percentages)

From   "David Moore" <[email protected]>
To   [email protected]
Subject   st: RE: Missings in Survey Design (was: Bootstrap and percentages)
Date   Tue, 27 Aug 2002 09:52:45 -0700

Although it may not seem a general solution to the problem, since complex
surveys can make it impractical, shouldn't a sample selection model handle
this situation in theory?

-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Joseph McCrary
Sent: Tuesday, August 27, 2002 4:53 AM
To: [email protected]
Subject: st: Missings in Survey Design (was: Bootstrap and percentages)

The answer pretty much depends on your research question. Are you
interested in explaining the variance of the group of smokers (the
question you posed, "Has a doctor ever told you to quit smoking" is a
good example), or comparing the group of smokers to a larger population
(e.g., "How effective do you think the latest Ad Council's commercials
have been in preventing teen smoking?"). In the latter, you might be
interested in comparing smokers to non-smokers.

I'd love to see any references you collect as I will probably be
undertaking quite a bit in the way of surveying shortly. Thank you.

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Sayer, Bryan
Sent: Monday, August 26, 2002 1:02 PM
To: '[email protected]'
Subject: st: RE: Bootstrap and percentages

We are in the middle of investigating this issue of "missings" (meaning
Not In Universe) with regard to sample design, but my view right now is
that Stata does not address this properly.

Anyway, one fix is assign values to missing cases (use something that
will cause problems if the cases get included like -999) and then use a
subpop statement to restrict the analysis to the proper cases.

For those of you interested in this issue, the philosphopical question
is this.  Consider a survey designed to sample from a specific target
population, and assume that there are correct sample design variables
for that population and that these design variables meet the criteria of
at least two PSUs per stratum.  Now assume that certain questions are
asked only of a subgroup of the population.  For example, we have a
question "Has a doctor ever told you to quit smoking?"  Clearly the
question is asked only of smokers, and in our case, smokers who have
seen a physician in the past 12 months.  Now are the group of people who
are asked this question a sub-population of the target population, or
are they a population unto themselves?  If they are a sub-population,
then the sample design variables appropriate for the target population
are sufficient to describe the sub-population, and Stata ought to
estimate properly without consideration of missings for those not in the
sub-group.  But if this is a separate population, then yes, a new set of
sample design variables is necessary.

I'm collecting references.  If enough people are interested I'll post

Bryan Sayer
Statistician, SSS Inc.
[email protected]

-----Original Message-----
From: Michael R. Smith [mailto:[email protected]]
Sent: Monday, August 26, 2002 11:47 AM
To: [email protected]
Subject: st: Bootstrap and percentages

I'm processing some data that was generated with a complex sample design
but to which, for reasons of confidentiality, I don't have direct
access. This means that the svy commands are not a practical option. Use
of them requires correction for PSUs with only one case - but the PSUs
with missing values that reduce them to one case will vary depending on
variables in the analysis. Submitting my code to the person who runs it
in order to find out when and where to merge PSUs would, then, become an
extremely cumbersome process.

So bootstrapping looks like the most practical method for inferential
purposes. It's clear how to do that with regression and related
procedures. But's it's not obvious to me how one should go about using
the bs command to generate standard errors for a percentage table. The
bs command requires specifying each coefficient to be bootstrapped. How
does one specify cells in a percentage table? Part of the analysis
requires generating percentage tables with quite large numbers of cells,
so I need to generate a large number of standard errors.

I've read what seem to be the relevant sections of the manual and rooted
around in the FAQs and other documentation for an answer, so far with no

Michael Smith

*   For searches and help try:
*   For searches and help try:

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index