Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Fwd: st: -svyset- methods to account for singleton PSUs

From   Steve Samuels <>
Subject   Fwd: st: -svyset- methods to account for singleton PSUs
Date   Tue, 6 Jul 2010 12:57:18 -0400

The last few posts between James and me  about this topic were
off-list.  There are unanswered questions, so I'm showing the entire
correspondence here.


Steven Samuels
18 Cantine's Island
Saugerties NY 12477
Voice: 845-246-0774
Fax:    206-202-4783

From: James Shaw <>
Date: Tue, Jul 6, 2010 at 12:42 PM
Subject: Re: st: -svyset- methods to account for singleton PSUs
To: Steve Samuels <>


Thanks for confirming my suspicion.  I wrote my own code to perform
the multistage stratified variant of the jackknife procedure (Wolter
pp. 182-185).  Unfortunately, I have never had any luck using
svy:jackknife or the svr equivalent with user-written programs.  I
will contact UIC's survey research lab regarding the appropriateness
of using the jackknife for estimating CIs for F-stat ratios.


James W. Shaw, Ph.D., Pharm.D., M.P.H.
Assistant Professor
Department of Pharmacy Administration
College of Pharmacy
University of Illinois at Chicago
833 South Wood Street, M/C 871, Room 252
Chicago, IL 60612
Tel.: 312-355-5666
Fax: 312-996-0868
Mobile Tel.: 215-852-3045

On Tue, Jul 6, 2010 at 11:26 AM, Steve Samuels <> wrote:

I don't know much about MEPS. I know even less about jackknifing test
statistics and the properties of CIs based on the jackknifed standard
errors. You do have an excellent local resource in the Survey Research
Laboratory at UIC, and perhaps someone more knowledgeable than I am
will chime in.

Stata's jackknife estimator of variance assumes simple random sampling
within strata, which will be an approximation if sampling is with
unequal probabilities. The -singleunit- option will lead to ignoring
just one stratum for each jackknife replicate. If there are many
strata, I would guess that the resulting downward bias in the
estimates of variance for parameters will be small (suggested by
Sections 4.5 and 4.6 of K Wolter, 2007, Introduction to Variance
Estimation, Springer, NY.)


Steven Samuels
18 Cantine's Island
Saugerties NY 12477
Voice: 845-246-0774
Fax: 206-202-4783

:39 AM, James Shaw <> wrote:

Thanks for taking the time to reply to my query.

I am using the jackknife to estimate confidence intervals for ratios
of F statistics.  Data are being drawn from the Medical Expenditure
Panel Survey (MEPS), which uses a stratified clustered design in which
each stratum contains 2 or 3 PSUs.  There will be 1 singleton in each
replicate associated with a stratum containing 2 PSUs and no
singletons in replicates associated with strata containing 3 PSUs.  In
each replicate that contains a singleton, it will not be possible to
estimate variances without applying some sort of "fix."  I do not
think grouping strata will be feasible.

I had thought to use the singlunit(centered), singlunit(scaled), or
singlunit(certainty) method to account for the singletons and
facilitate variance estimation.  The centered method yields confidence
intervals that are noticeably wider than those derived using the other
two methods.  The centered and certainty methods yield identical
results.  This is because I am jackknifing ratios of F statistics, and
the scaled method affects each F statistic similarly.  F statistics
derived using the scaled method are generally smaller than those
derived using the certainty method; however, the F-statistic ratios
are the same.

My intuition is that the downward bias of the certainty and scaled
methods will be smaller than the upward bias of the centered method.
However, based on your response, I am not sure it would be appropriate
to treat the lone singleton in each replicate associated with a
stratum containing 2 PSUs as a certainty unit.  Given my stated
application, do you have any thoughts as to which of the methods
(i.e., centered vs. certainty/scaled) would be preferable?

BTW, you may question my decision to use the jackknife.  Given MEPS'
sampling design, the only other procedure that could be used would be
balanced repeated replication with orthogonal arrays.  I know that
this can be done in R; however, the necessary programming in Stata
would be fairly complicated.


On Mon, Jul 5, 2010 at 12:44 PM, Steve Samuels <> wrote:
1) The -singleunit(certainty)- option should be specified whenever a
PSU was selected with certainty. To use it, create a separate stratum
for each such unit. If later stages of sampling are ignored in
-svyset- then this option ignores the contribution from later stage
units to standard errors and so will understate the standard error
slightly. Theorems 10.3 and 10.4, p. 286 of WG Cochran (1977)
Sampling Techniques, Wiley, shows that for SRS at all stages, the bias
from ignoring the later stages will be minimal: the denominator for
the later stage variance components is the total number of
observations at that stage, which can often be large.

In some designs, all strata contain only a single PSU and standard
errors are formed from differences in the squares of PSU means from
adjacent strata. See the new -svy sdr- command. When -svy sdr- is
used, the - singleunit()- option should have no effect.

The other two automatic options are intended for situations in which
one or more selected PSUs is missing from the stratum. This can occur
when the entire PSU is missing, or more commonly, when either there
are no members of a subpopulation in a PSU or when there are no
observations in the PSU with non-missing values of crucial variables.
In these cases, I prefer not to use either automatic option, but to
merge the single unit PSUs with PSUs in "adjacent" strata.

2) The singlunit(centered) option will usually yield an upwardly
biased estimate of standard error, as you surmise. If the absent PSUs
are missing at random, the contribution to (positive) bias will be
roughly proportional to the squared difference between stratum mean
and population mean, divided by sample size. However, if PSU
missingness is related to the magnitude of the study variables, the
bias could be negative. The amount bias in any one study will be
depend on the particulars.

3) The direction of the bias from the singleunit(scaled) option can be
positive or negative, not just negative as you expect. Again,
generalization is impossible and will depend on the particular
population, stratum, and reason for missingness of absent PSUs.


On Mon, Jul 5, 2010 at 10:14 AM, James Shaw <> wrote:
Can anyone cite references that discuss the singleunit(centered) and
singleunit(scaled) methods for accommodating singleton PSUs?  I would
expect the certainty and scaled methods to yield downward-biased
variance estimates and the centered method to yield upward-biased
estimates.  However, it is not clear to me if and how the magnitudes
of the biases differ among the methods.
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index