[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Non-parametric tests for survey data? (e.g., Kruskal-Wallace) |

Date |
Tue, 10 Feb 2009 22:48:49 -0600 |

Oh, one of those wonderful how-do-I-stick-weights-into questions... Survey statistics is way more complicated that figuring out where the weights go. In that world, we need to work with population based quantities and their estimates. It can be something that might look model-free (distribution function) or heavily model-based (regression coefficient), but there must be something in the population that the procedure should be consistent for (population distribution function, census regression). That is, when you can access the complete population, your procedure must give you an exact answer. If you are estimating the mean, and can use the census, your estimator should give you the true population mean, for instance. Ranks per se are sample-based quantities, and if you get the full population as your sample, your ranks run from 1 to however many millions your population is -- not to the few dozens or hundreds or thousands your sample size may be. The non-parameteric procedures, despite being distribution-free, are not at all model free: you still assume that your data are i.i.d. from a distribution, with group differences described by simple shift. That's not going to work in survey statistics world, at least in design-based statistics world. Asking whether two or more distribution functions are the same in two or more domains in the population might be meaningful, or might be not: you have a fixed population, so there is no reason to expect that even two measurements from these distinct domains will be the same (if we talk about a continuous variable), let alone two distribution functions could coincide completely. On the other hand, here we do talk about the distribution functions which are population based quantities which are estimable with survey data, and asking whether we can see the difference between the two or more distribution functions using the sample data is something that should be answerable. You might even be able to get something like Kruskal-Wallis statistic and pretend that your sample value is an estimate of the population-based quantity. But then you need to figure out (i) what to do with that population Kruskal-Wallis -- if it is non-zero, how do you interpret it? and (ii) you'd also need to think how to describe the distribution of the sample based Kruskal-Wallis with respect to the sample design -- that is the relevant probability space out there. Obviously any distribution exists in the finite population sampling world -- for one thing, that probability space is discrete and finite, you can enumerate all samples and get your distribution in closed form. At least that's the conceptual thinking. In large samples though, you should be getting convergence to the population value, rather than an O_p(1) chi-square distribution in the regular asymptotics. Besides, in this particular problem, I would guess you could only get any hope of describing that sample distribution if you have sample sizes fixed by design, and that is rarely guaranteed in most practical situations. On 2/10/09, Michael I. Lichter <mlichter@buffalo.edu> wrote: > I don't see any procedures for doing non-parametric tests (aside from > chi-square in svy: tab) with complex survey data (stratified, unequal > probabilities of selection). I am particularly looking for tests of > difference in ordinal dependent variables across k groups (k > 2). > Kruskall-Wallace is the most obvious test, but only available for non-survey > data. > I assume that these procedures are not available because (a) it's not clear > what to do with weights in nonparametric analyses anyway (which I infer > partly from the fact that none of Stata's nonparametric procedures take > weights), (b) because there's no theory about whether/how they should work, > and/or (c) because nobody has gotten around to it yet. > > I'm looking for suggestions. > > One possibility that comes to mind is to generate ranks using -egen- and > analyze using -svy: mean- or -svy: reg- (I'd use one-way ANOVA if somebody > could explain how to do it with -svy- commands). I could also do -svy: > intreg- for the variables that represent ranges underlying continuous > variables (since most of my ordinal variables do represent well-defined but > unequal-sized ranges of underlying continuous variables, e.g., 1 = "> 1", 2 > = "2-4" 3 = "5 or more"), but that would require -intreg- to be robust to > floor effects, and I doubt that it is (since the method assumes an > underlying Normal distribution). (I guess -mlogit-, -ologit- and -gologit2- > are also possibilities.) > > Thanks. > > -- > Michael I. Lichter, Ph.D. > Research Assistant Professor & NRSA Fellow > UB Department of Family Medicine / Primary Care Research Institute > UB Clinical Center, 462 Grider Street, Buffalo, NY 14215 > Office: CC 125 / Phone: 716-898-4751 / E-Mail: mlichter@buffalo.edu > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Non-parametric tests for survey data? (e.g., Kruskal-Wallace)***From:*"Michael I. Lichter" <mlichter@buffalo.edu>

- Prev by Date:
**Re: st: RE: Questions about -triplot-** - Next by Date:
**st: Latent variable DVs in gllamm** - Previous by thread:
**st: RE: Non-parametric tests for survey data? (e.g., Kruskal-Wallace)** - Next by thread:
**st: RE: Non-parametric tests for survey data? (e.g., Kruskal-Wallace)** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |