[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Svy subsamples

From	Steven Joel Hirsch Samuels <[email protected]>
To	[email protected]
Subject	Re: st: Svy subsamples
Date	Wed, 21 Nov 2007 15:59:48 -0500

To Steven Samuel
Forgive me for interferring your conversation with Mr. Richard Williams.
However I'm dealing with a dataset consisting of 10 subsamples with information collected over a period of 7 years.

I was just wondering why you suggest to the ignore the study weights, especially if they were post-stratified...?

Regards,
--
John Singhammer, Dr.phil, Mphil
Dept. of Public Health
Olof Palmes All� 17
DK8200 Aarhus
Tel: +45 8728 4715
Mobile phone: +45 2530 5768

You are not interfering This is a conversation open to all. This is a slightly expanded version of what I sent to you privately.

How to treat the subpopulation and weights depends on the purpose of the study. There is a Statalist thread which you can look up. First, note that the 'subpopulation' Richard's student wants to study is not a 'subsample'. I have sometimes taken 10 random subsamples of a single population to study variability between samples. This is the method of 'interpenetrating replicated subsamples' of Mahalanobis which was popularized by WE Deming in the 1950's(Sample Design in Business Research, Wiley, 1960).

To expand on the reason for ignoring the subpopulation criterion. If Richard's student were to analyze the data as a subpopulation, then every sample mean have to be considered a ratio estimate, effectively analyzed with a 'ratio' procedure, which is what the 'subpop' option in the survey commands does. This is because the denominator in mean = (sum of X variable)/(no. of people in the subpopulation) would be considered a random variable. At an extreme, the very appearance of a subpopulation is a random event and the appropriate SE takes this into account. However it is likely that Richard's student is interested in the subpopulation as a way of studying a question unrelated to the original targt population--see below. In theoretical terms, she may want to study associations, conditional on membership in the subpopulation.

To answer your question about weights.

1. If the purpose of a study is analytic (hypothesis testing, studying relations between variables) then Richard's student may not be really interested in the original target population. As an example, she might never report the weighted counts; she would report the sample counts for crucial variables. The only weights that I would suggest, if any, are those which correct for non-response and unequal probability of selection.

2. It may be better to consider the study as an 'experimental design', where population numbers of the experimental groups are not relevant. In Survey Errors and Survey Costs by R. Groves (Wiley Books), Groves posts the example of a study of noise in the vicinity of an airport. A study is to be done dividing the area around the airport into 'strata', which are zones at equal distance from the flight path or airport. An equal sample size is taken from each zone and the goal is to study relation of noise to distance. Of course most people in the study area will not live in the closest zones. A weighted analysis would give the closest people their population weight. This would be okay if the main goal was descriptive--to estimate the 'average' noise experienced by residents around the airport. However if you consider this an experimental design, then you want equal numbers at each dose, or, in fact, more at the extremes. Thus you would not apply the population weights.

You may think this is an extreme case, but I have seen just this analysis in a published study of the association of gestational age to birth weight. Low birth weight infants were oversampled--they are only 5-10% of the population. Yet the analysts did the weighted analysis, which meant that the association in the vicinity of low birthweights was badly determined unless the model was correct.

This is an ongoing debate among survey statisticians, so you will get different points of view.

On Nov 21, 2007, at 3:08 PM, John Singhammer wrote:

To Steven Samuel
Forgive me for interferring your conversation with Mr. Richard Williams.
However I'm dealing with a dataset consisting of 10 subsamples with information collected over a period of 7 years.

I was just wondering why you suggest to the ignore the study weights, especially if they were post-stratified...?

Regards,
--
John Singhammer, Dr.phil, Mphil
Dept. of Public Health
Olof Palmes All� 17
DK8200 Aarhus
Tel: +45 8728 4715
Mobile phone: +45 2530 5768

Steven  Samuels

[email protected]
18 Cantine's Island
Saugerties, NY 12477
Phone: 845-246-0774
EFax: 208-498-7441





*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Svy subsamples
  - From: Richard Williams <[email protected]>
- Re: st: Svy subsamples
  - From: "Austin Nichols" <[email protected]>
- Re: st: Svy subsamples
  - From: Statalist <[email protected]>

References:
- st: Svy subsamples
  - From: Richard Williams <[email protected]>

Prev by Date: Re: st: discrete time-varying covariate in cox models
Next by Date: st: Generating a unique ID
Previous by thread: Re: st: Svy subsamples
Next by thread: Re: st: Svy subsamples
Index(es):
- Date
- Thread