Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Svy subsamples


From   Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Svy subsamples
Date   Wed, 21 Nov 2007 15:59:48 -0500

To Steven Samuel
Forgive me for interferring your conversation with Mr. Richard Williams.
However I'm dealing with a dataset consisting of 10 subsamples with information collected over a period of 7 years.

I was just wondering why you suggest to the ignore the study weights, especially if they were post-stratified...?

Regards,
--
John Singhammer, Dr.phil, Mphil
Dept. of Public Health
Olof Palmes Allè 17
DK8200 Aarhus
Tel: +45 8728 4715
Mobile phone: +45 2530 5768


You are not interfering This is a conversation open to all. This is a slightly expanded version of what I sent to you privately.

How to treat the subpopulation and weights depends on the purpose of the study. There is a Statalist thread which you can look up. First, note that the 'subpopulation' Richard's student wants to study is not a 'subsample'. I have sometimes taken 10 random subsamples of a single population to study variability between samples. This is the method of 'interpenetrating replicated subsamples' of Mahalanobis which was popularized by WE Deming in the 1950's(Sample Design in Business Research, Wiley, 1960).

To expand on the reason for ignoring the subpopulation criterion. If Richard's student were to analyze the data as a subpopulation, then every sample mean have to be considered a ratio estimate, effectively analyzed with a 'ratio' procedure, which is what the 'subpop' option in the survey commands does. This is because the denominator in mean = (sum of X variable)/(no. of people in the subpopulation) would be considered a random variable. At an extreme, the very appearance of a subpopulation is a random event and the appropriate SE takes this into account. However it is likely that Richard's student is interested in the subpopulation as a way of studying a question unrelated to the original targt population--see below. In theoretical terms, she may want to study associations, conditional on membership in the subpopulation.

To answer your question about weights.

1. If the purpose of a study is analytic (hypothesis testing, studying relations between variables) then Richard's student may not be really interested in the original target population. As an example, she might never report the weighted counts; she would report the sample counts for crucial variables. The only weights that I would suggest, if any, are those which correct for non-response and unequal probability of selection.

2. It may be better to consider the study as an 'experimental design', where population numbers of the experimental groups are not relevant. In Survey Errors and Survey Costs by R. Groves (Wiley Books), Groves posts the example of a study of noise in the vicinity of an airport. A study is to be done dividing the area around the airport into 'strata', which are zones at equal distance from the flight path or airport. An equal sample size is taken from each zone and the goal is to study relation of noise to distance. Of course most people in the study area will not live in the closest zones. A weighted analysis would give the closest people their population weight. This would be okay if the main goal was descriptive--to estimate the 'average' noise experienced by residents around the airport. However if you consider this an experimental design, then you want equal numbers at each dose, or, in fact, more at the extremes. Thus you would not apply the population weights.

You may think this is an extreme case, but I have seen just this analysis in a published study of the association of gestational age to birth weight. Low birth weight infants were oversampled--they are only 5-10% of the population. Yet the analysts did the weighted analysis, which meant that the association in the vicinity of low birthweights was badly determined unless the model was correct.

This is an ongoing debate among survey statisticians, so you will get different points of view.


On Nov 21, 2007, at 3:08 PM, John Singhammer wrote:



To Steven Samuel
Forgive me for interferring your conversation with Mr. Richard Williams.
However I'm dealing with a dataset consisting of 10 subsamples with information collected over a period of 7 years.

I was just wondering why you suggest to the ignore the study weights, especially if they were post-stratified...?

Regards,
--
John Singhammer, Dr.phil, Mphil
Dept. of Public Health
Olof Palmes Allè 17
DK8200 Aarhus
Tel: +45 8728 4715
Mobile phone: +45 2530 5768

Steven  Samuels

sjhsamuels@earthlink.net
18 Cantine's Island
Saugerties, NY 12477
Phone: 845-246-0774
EFax: 208-498-7441





*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index