[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: median regressin and survey data!

From   "Stas Kolenikov" <>
Subject   Re: st: median regressin and survey data!
Date   Mon, 31 Mar 2008 15:32:15 -0500

See Rao & Wu (1988) and Rao, Wu and Yue (1992) for proper ways to
bootstrap survey data -- and Naive
implementation of the bootstrap will fail, and even resampling by
cluster will fail, too. One needs to perform some marginally tricky
rescaling of the weights to do it right, and use undocumented -bs4rw-
supplied by Jeff Pittblado -- try to -findit-. I have some code that
produces weights in simple situations of stratified single stage
samples; it needs to be extended to arbitrary designs, which I don't
know when I will get to.

Non-smoothness of the L1 norm is a problem for resampling methods, as
well -- jackknife is inconsistent, for instance. To my shame, I don't
remember whether the bootstrap per se is consistent, or needs to be
modified, too.

On 3/30/08, Austin Nichols <> wrote:
> Another short answer:
>  and long answer:
>  Even the -svy- commands do not give you the test you often want, for
>  whether the means/medians/etc. are the same for two data generating
>  processes (say, men and women) or the means/medians/etc. of the
>  distributions in some superpopulation of possible populations are
>  equal. For an unstratified survey sample, you can usually just account
>  for clustering (on PSU) and weights and ignore the fpc and get the
>  right answer, but for stratified samples, a theoretically justifiable
>  test may not be programmed.  However, using weights (either as
>  aweights or pweights) and bootstrapping with the cluster option will
>  often give accurate inference when your options are limited.  There is
>  also a strata option for bootstrap, but weights+cluster may get you
>  the best performance--in the absence of a proof, you can always run
>  some simulations to convince yourself it will work for your particular
>  problem (by generating datasets that look more or less like yours).
>  On Sat, Mar 29, 2008 at 12:15 PM, Stas Kolenikov <> wrote:
>  > Short answer: neither one fits well enough into the paradigm of survey
>  >  sampling, so coming up with fully justifiable implementation is not
>  >  straightforward.
>  >
>  >  Long answer.
>  >
>  >  For the first one, all rank tests implicitly assume the data are
>  >  i.i.d., and I don't think very clear analogies are possible with
>  >  survey data. There are no estimating equations to work with; you
>  >  probably would be able to get the distribution of the test statistic
>  >  over repeated sampling, but it won't be nearly as nice as the textbook
>  >  distribution.
>  >
>  >  For the second one, -qreg-is a heavily model-based concept: that for
>  >  any combination of explanatory variables, there's a well defined
>  >  distribution of responses over which the median can be computed. The
>  >  straight design perspective, on the other hand, says that there are
>  >  only so many individuals in the finite population, so there is no talk
>  >  about conditional distributions. So one needs to invent some sort of a
>  >  hybrid framework to incorporate both model and design ideas, and they
>  >  don't always go hand in hand. A basic introduction to the subject is a
>  >  chapter by Binder and Roberts in 2003 Analysis of Survey Data book
>  >  ( -- I say introductory
>  >  because they consider the simplest possible situations, but they still
>  >  operate with big-O small-O in probability. Conceptually, it should
>  >  still be possible to formulate median regression for sample surveys,
>  >  as it is linked to a minimization problem, and thus can be cast in
>  >  terms of estimating equations. Then you need to say, "If I had the
>  >  full population, I would run this same median regression on it, and
>  >  get some numbers from this census estimation procedure. Now, what I
>  >  can hope for with the sample is that my estimates are going to be
>  >  consistent for those numbers that came out of the census problem". I
>  >  don't really know if that was done for quantile regression; for linear
>  >  regression, the comparable result goes back to mid 1970s due to Wayne
>  >  Fuller, and for generalized linear models, to David Binder's 1983
>  >  paper. Median regression is somewhat trickier though, as the function
>  >  being minimzed, the sum of absolute deviations, is not differentiable,
>  >  so the standard tools like the delta method are not applicable.
>  >
>  >
>  >
>  >  On 3/29/08, Mohammed El Faramawi <> wrote:
>  >  > Hi,
>  >  >  I am trying to run non-parametric tests using survey
>  >  >  data ( probability weighted). unfortunately I can not
>  >  >  find commands which takes into the consideration the
>  >  >  pweight. I am interested in qreg (median regression)
>  >  >  and Mann-Whitney test. Is there any way to do this by
>  >  >  Stata? Thank you
>  >  >  Mohammed Faramawi, MD,Phd,MPH,Msc
> *
>  *   For searches and help try:
>  *
>  *
>  *

Stas Kolenikov, also found at

Small print: Please do not reply to my Gmail address as I don't check
it regularly.
*   For searches and help try:

© Copyright 1996–2019 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index