# Re: st: Svy subsamples

 From "Austin Nichols" To statalist@hsphsun2.harvard.edu Subject Re: st: Svy subsamples Date Sat, 24 Nov 2007 20:43:16 -0500

```Steven--
This is a topic of debate more broadly--not just among survey
statisticians--I think the issue of how to weight regressions is a
thorny one in many disciplines...

On 11/21/07, Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net> wrote:
> > To Steven Samuel
> > Forgive me for interferring your conversation with Mr. Richard
> > Williams.
> > However I'm dealing with a dataset consisting of 10 subsamples with
> > information collected over a period of 7 years.
> > I was just wondering why you suggest to the ignore the study
> > weights, especially if they were post-stratified...?
> > Regards,
> > John Singhammer, Dr.phil, Mphil
>
> You are not interfering  This is a conversation open to all. This is
> a slightly expanded version of what I sent to you privately.
>
> How to treat the subpopulation and weights depends on the purpose of
> the study.  There is a Statalist thread which you can look up. First,
> note that the 'subpopulation' Richard's student wants to study is not
> a 'subsample'. I have sometimes taken 10 random subsamples of a
> single population to study variability between samples.  This is the
> method of 'interpenetrating replicated subsamples' of Mahalanobis
> which was popularized by WE Deming in the 1950's(Sample Design in
>
> To expand on the reason for ignoring the subpopulation criterion.  If
> Richard's student were to analyze the data as a subpopulation, then
> every sample mean have to be considered a ratio estimate, effectively
> analyzed with a 'ratio' procedure, which is what the 'subpop' option
> in the survey commands does. This is because the denominator in mean
> = (sum of X variable)/(no. of people in the subpopulation) would be
> considered a random variable. At an extreme, the very appearance of a
> subpopulation is a random event and the appropriate SE takes this
> into account.  However it is likely that Richard's student is
> interested in the subpopulation as a way of studying a question
> unrelated to the original targt population--see below.  In
> theoretical terms, she may want to study associations, conditional on
> membership in the subpopulation.
>
>
> 1. If the purpose of a study is analytic (hypothesis testing,
> studying relations between variables) then Richard's student may not
> be really interested in the original target population.  As an
> example, she might never report the weighted counts; she would report
> the sample counts for crucial variables. The only weights that I
> would suggest, if any, are those which correct for non-response and
> unequal probability of selection.
>
> 2. It may be better to consider the study as an 'experimental
> design', where population numbers of the experimental groups are not
> relevant.  In Survey Errors and Survey Costs by R. Groves (Wiley
> Books), Groves posts the example of a study of noise in the vicinity
> of an airport.  A study is to be done dividing the area around the
> airport into 'strata', which are zones at equal distance from the
> flight path or airport.  An equal sample size is taken from each zone
> and the goal is to study relation of noise to distance. Of course
> most people in the study area will not live in the closest zones.  A
> weighted analysis would give the closest people their population
> weight.  This would be okay if the main goal was descriptive--to
> estimate the 'average' noise experienced by residents around the
> airport.  However if you consider this an experimental design, then
> you want equal numbers at each dose, or, in fact, more at the
> extremes.  Thus you would not apply the population weights.
>
> You may think this is an extreme case, but I have seen just this
> analysis in a published study of the association of gestational age
> to birth weight.  Low birth weight infants were oversampled--they are
> only 5-10% of the population. Yet the analysts did the weighted
> analysis, which meant that the association in the vicinity of low
> birthweights was badly determined unless the model was correct.
>
> This is an ongoing debate among survey statisticians, so you will get
> different points of view.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```