[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Svy subsamples
"Austin Nichols" <email@example.com>
Re: st: Svy subsamples
Sat, 24 Nov 2007 20:43:16 -0500
This is a topic of debate more broadly--not just among survey
statisticians--I think the issue of how to weight regressions is a
thorny one in many disciplines...
On 11/21/07, Steven Joel Hirsch Samuels <firstname.lastname@example.org> wrote:
> > To Steven Samuel
> > Forgive me for interferring your conversation with Mr. Richard
> > Williams.
> > However I'm dealing with a dataset consisting of 10 subsamples with
> > information collected over a period of 7 years.
> > I was just wondering why you suggest to the ignore the study
> > weights, especially if they were post-stratified...?
> > Regards,
> > John Singhammer, Dr.phil, Mphil
> You are not interfering This is a conversation open to all. This is
> a slightly expanded version of what I sent to you privately.
> How to treat the subpopulation and weights depends on the purpose of
> the study. There is a Statalist thread which you can look up. First,
> note that the 'subpopulation' Richard's student wants to study is not
> a 'subsample'. I have sometimes taken 10 random subsamples of a
> single population to study variability between samples. This is the
> method of 'interpenetrating replicated subsamples' of Mahalanobis
> which was popularized by WE Deming in the 1950's(Sample Design in
> Business Research, Wiley, 1960).
> To expand on the reason for ignoring the subpopulation criterion. If
> Richard's student were to analyze the data as a subpopulation, then
> every sample mean have to be considered a ratio estimate, effectively
> analyzed with a 'ratio' procedure, which is what the 'subpop' option
> in the survey commands does. This is because the denominator in mean
> = (sum of X variable)/(no. of people in the subpopulation) would be
> considered a random variable. At an extreme, the very appearance of a
> subpopulation is a random event and the appropriate SE takes this
> into account. However it is likely that Richard's student is
> interested in the subpopulation as a way of studying a question
> unrelated to the original targt population--see below. In
> theoretical terms, she may want to study associations, conditional on
> membership in the subpopulation.
> To answer your question about weights.
> 1. If the purpose of a study is analytic (hypothesis testing,
> studying relations between variables) then Richard's student may not
> be really interested in the original target population. As an
> example, she might never report the weighted counts; she would report
> the sample counts for crucial variables. The only weights that I
> would suggest, if any, are those which correct for non-response and
> unequal probability of selection.
> 2. It may be better to consider the study as an 'experimental
> design', where population numbers of the experimental groups are not
> relevant. In Survey Errors and Survey Costs by R. Groves (Wiley
> Books), Groves posts the example of a study of noise in the vicinity
> of an airport. A study is to be done dividing the area around the
> airport into 'strata', which are zones at equal distance from the
> flight path or airport. An equal sample size is taken from each zone
> and the goal is to study relation of noise to distance. Of course
> most people in the study area will not live in the closest zones. A
> weighted analysis would give the closest people their population
> weight. This would be okay if the main goal was descriptive--to
> estimate the 'average' noise experienced by residents around the
> airport. However if you consider this an experimental design, then
> you want equal numbers at each dose, or, in fact, more at the
> extremes. Thus you would not apply the population weights.
> You may think this is an extreme case, but I have seen just this
> analysis in a published study of the association of gestational age
> to birth weight. Low birth weight infants were oversampled--they are
> only 5-10% of the population. Yet the analysts did the weighted
> analysis, which meant that the association in the vicinity of low
> birthweights was badly determined unless the model was correct.
> This is an ongoing debate among survey statisticians, so you will get
> different points of view.
* For searches and help try: