[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Austin Nichols" <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Svy subsamples |

Date |
Sat, 24 Nov 2007 20:43:16 -0500 |

Steven-- This is a topic of debate more broadly--not just among survey statisticians--I think the issue of how to weight regressions is a thorny one in many disciplines... On 11/21/07, Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net> wrote: > > To Steven Samuel > > Forgive me for interferring your conversation with Mr. Richard > > Williams. > > However I'm dealing with a dataset consisting of 10 subsamples with > > information collected over a period of 7 years. > > I was just wondering why you suggest to the ignore the study > > weights, especially if they were post-stratified...? > > Regards, > > John Singhammer, Dr.phil, Mphil > > You are not interfering This is a conversation open to all. This is > a slightly expanded version of what I sent to you privately. > > How to treat the subpopulation and weights depends on the purpose of > the study. There is a Statalist thread which you can look up. First, > note that the 'subpopulation' Richard's student wants to study is not > a 'subsample'. I have sometimes taken 10 random subsamples of a > single population to study variability between samples. This is the > method of 'interpenetrating replicated subsamples' of Mahalanobis > which was popularized by WE Deming in the 1950's(Sample Design in > Business Research, Wiley, 1960). > > To expand on the reason for ignoring the subpopulation criterion. If > Richard's student were to analyze the data as a subpopulation, then > every sample mean have to be considered a ratio estimate, effectively > analyzed with a 'ratio' procedure, which is what the 'subpop' option > in the survey commands does. This is because the denominator in mean > = (sum of X variable)/(no. of people in the subpopulation) would be > considered a random variable. At an extreme, the very appearance of a > subpopulation is a random event and the appropriate SE takes this > into account. However it is likely that Richard's student is > interested in the subpopulation as a way of studying a question > unrelated to the original targt population--see below. In > theoretical terms, she may want to study associations, conditional on > membership in the subpopulation. > > To answer your question about weights. > > 1. If the purpose of a study is analytic (hypothesis testing, > studying relations between variables) then Richard's student may not > be really interested in the original target population. As an > example, she might never report the weighted counts; she would report > the sample counts for crucial variables. The only weights that I > would suggest, if any, are those which correct for non-response and > unequal probability of selection. > > 2. It may be better to consider the study as an 'experimental > design', where population numbers of the experimental groups are not > relevant. In Survey Errors and Survey Costs by R. Groves (Wiley > Books), Groves posts the example of a study of noise in the vicinity > of an airport. A study is to be done dividing the area around the > airport into 'strata', which are zones at equal distance from the > flight path or airport. An equal sample size is taken from each zone > and the goal is to study relation of noise to distance. Of course > most people in the study area will not live in the closest zones. A > weighted analysis would give the closest people their population > weight. This would be okay if the main goal was descriptive--to > estimate the 'average' noise experienced by residents around the > airport. However if you consider this an experimental design, then > you want equal numbers at each dose, or, in fact, more at the > extremes. Thus you would not apply the population weights. > > You may think this is an extreme case, but I have seen just this > analysis in a published study of the association of gestational age > to birth weight. Low birth weight infants were oversampled--they are > only 5-10% of the population. Yet the analysts did the weighted > analysis, which meant that the association in the vicinity of low > birthweights was badly determined unless the model was correct. > > This is an ongoing debate among survey statisticians, so you will get > different points of view. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: rvfplot and heteroskedasticity***From:*Celia Patricia Vera <cpatriciavera@yahoo.com>

**References**:**st: Svy subsamples***From:*Richard Williams <Richard.A.Williams.5@ND.edu>

**Re: st: Svy subsamples***From:*Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net>

- Prev by Date:
**Re: st: Svy subsamples** - Next by Date:
**Re: st: Svy subsamples** - Previous by thread:
**Re: st: Svy subsamples** - Next by thread:
**st: rvfplot and heteroskedasticity** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |