Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Fwd: st: -svyset- methods to account for singleton PSUs |

Date |
Tue, 6 Jul 2010 12:57:18 -0400 |

The last few posts between James and me about this topic were off-list. There are unanswered questions, so I'm showing the entire correspondence here. Steve Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 From: James Shaw <shawjw@gmail.com> Date: Tue, Jul 6, 2010 at 12:42 PM Subject: Re: st: -svyset- methods to account for singleton PSUs To: Steve Samuels <sjsamuels@gmail.com> Steve: Thanks for confirming my suspicion. I wrote my own code to perform the multistage stratified variant of the jackknife procedure (Wolter pp. 182-185). Unfortunately, I have never had any luck using svy:jackknife or the svr equivalent with user-written programs. I will contact UIC's survey research lab regarding the appropriateness of using the jackknife for estimating CIs for F-stat ratios. -- Jim James W. Shaw, Ph.D., Pharm.D., M.P.H. Assistant Professor Department of Pharmacy Administration College of Pharmacy University of Illinois at Chicago 833 South Wood Street, M/C 871, Room 252 Chicago, IL 60612 Tel.: 312-355-5666 Fax: 312-996-0868 Mobile Tel.: 215-852-3045 On Tue, Jul 6, 2010 at 11:26 AM, Steve Samuels <sjsamuels@gmail.com> wrote: James, I don't know much about MEPS. I know even less about jackknifing test statistics and the properties of CIs based on the jackknifed standard errors. You do have an excellent local resource in the Survey Research Laboratory at UIC, and perhaps someone more knowledgeable than I am will chime in. Stata's jackknife estimator of variance assumes simple random sampling within strata, which will be an approximation if sampling is with unequal probabilities. The -singleunit- option will lead to ignoring just one stratum for each jackknife replicate. If there are many strata, I would guess that the resulting downward bias in the estimates of variance for parameters will be small (suggested by Sections 4.5 and 4.6 of K Wolter, 2007, Introduction to Variance Estimation, Springer, NY.) Steve -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 :39 AM, James Shaw <shawjw@gmail.com> wrote: Steve: Thanks for taking the time to reply to my query. I am using the jackknife to estimate confidence intervals for ratios of F statistics. Data are being drawn from the Medical Expenditure Panel Survey (MEPS), which uses a stratified clustered design in which each stratum contains 2 or 3 PSUs. There will be 1 singleton in each replicate associated with a stratum containing 2 PSUs and no singletons in replicates associated with strata containing 3 PSUs. In each replicate that contains a singleton, it will not be possible to estimate variances without applying some sort of "fix." I do not think grouping strata will be feasible. I had thought to use the singlunit(centered), singlunit(scaled), or singlunit(certainty) method to account for the singletons and facilitate variance estimation. The centered method yields confidence intervals that are noticeably wider than those derived using the other two methods. The centered and certainty methods yield identical results. This is because I am jackknifing ratios of F statistics, and the scaled method affects each F statistic similarly. F statistics derived using the scaled method are generally smaller than those derived using the certainty method; however, the F-statistic ratios are the same. My intuition is that the downward bias of the certainty and scaled methods will be smaller than the upward bias of the centered method. However, based on your response, I am not sure it would be appropriate to treat the lone singleton in each replicate associated with a stratum containing 2 PSUs as a certainty unit. Given my stated application, do you have any thoughts as to which of the methods (i.e., centered vs. certainty/scaled) would be preferable? BTW, you may question my decision to use the jackknife. Given MEPS' sampling design, the only other procedure that could be used would be balanced repeated replication with orthogonal arrays. I know that this can be done in R; however, the necessary programming in Stata would be fairly complicated. -- Jim On Mon, Jul 5, 2010 at 12:44 PM, Steve Samuels <sjsamuels@gmail.com> wrote: 1) The -singleunit(certainty)- option should be specified whenever a PSU was selected with certainty. To use it, create a separate stratum for each such unit. If later stages of sampling are ignored in -svyset- then this option ignores the contribution from later stage units to standard errors and so will understate the standard error slightly. Theorems 10.3 and 10.4, p. 286 of WG Cochran (1977) Sampling Techniques, Wiley, shows that for SRS at all stages, the bias from ignoring the later stages will be minimal: the denominator for the later stage variance components is the total number of observations at that stage, which can often be large. In some designs, all strata contain only a single PSU and standard errors are formed from differences in the squares of PSU means from adjacent strata. See the new -svy sdr- command. When -svy sdr- is used, the - singleunit()- option should have no effect. The other two automatic options are intended for situations in which one or more selected PSUs is missing from the stratum. This can occur when the entire PSU is missing, or more commonly, when either there are no members of a subpopulation in a PSU or when there are no observations in the PSU with non-missing values of crucial variables. In these cases, I prefer not to use either automatic option, but to merge the single unit PSUs with PSUs in "adjacent" strata. 2) The singlunit(centered) option will usually yield an upwardly biased estimate of standard error, as you surmise. If the absent PSUs are missing at random, the contribution to (positive) bias will be roughly proportional to the squared difference between stratum mean and population mean, divided by sample size. However, if PSU missingness is related to the magnitude of the study variables, the bias could be negative. The amount bias in any one study will be depend on the particulars. 3) The direction of the bias from the singleunit(scaled) option can be positive or negative, not just negative as you expect. Again, generalization is impossible and will depend on the particular population, stratum, and reason for missingness of absent PSUs. Steve On Mon, Jul 5, 2010 at 10:14 AM, James Shaw <shawjw@gmail.com> wrote: Can anyone cite references that discuss the singleunit(centered) and singleunit(scaled) methods for accommodating singleton PSUs? I would expect the certainty and scaled methods to yield downward-biased variance estimates and the centered method to yield upward-biased estimates. However, it is not clear to me if and how the magnitudes of the biases differ among the methods. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: -svyset- methods to account for singleton PSUs***From:*James Shaw <shawjw@gmail.com>

**Re: st: -svyset- methods to account for singleton PSUs***From:*Steve Samuels <sjsamuels@gmail.com>

- Prev by Date:
**st: overidentification test after cmp** - Next by Date:
**Re: st: Predicted probabilities after Poisson regression** - Previous by thread:
**Re: st: -svyset- methods to account for singleton PSUs** - Next by thread:
**st: RE: How to perform a non parametric manova** - Index(es):