Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Svy subsamples


From   Steven Joel Hirsch Samuels <sjhsamuels@earthlink.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Svy subsamples
Date   Sat, 24 Nov 2007 17:04:52 -0500

I believe that the questions of whether to include weights and of which weights to include, must be decided on a case-by-case basis. I might not use supplied weights in analyses of sub-populations or for certain analytic studies. I do not know that the reasons for my decisions not to weight would be considered prevalent concepts. In MLM, estimated variance components might be important, and I have not thought about weighting in connection with these. Certainly, a major reason that scientists do not consider whether to weight multilevel analyses is that their software cannot do so.

The reference below states that the following MLM packages can incorporate sampling weights: MPLUS, LISREL, MLWIN, and the contributed Stata program GLLAMM. Construction of the weights for these programs is not straight-forward, because different factors apply at each level of a hierarchical model. The reference describes, and contains links to, SAS and Stata programs that construct composite weights for two-level analyses.

Software to Compute Sampling Weights for Multilevel Analysis by Kim Chantala, Dan Blanchette, and C. M.Suchindran, 2006 ( http:// www.cpc.unc.edu/restools/data_analysis/ml_sampling_weights/Compute% 20Weights%20for%20Multilevel%20Analysis.pdf )


Steven Samuels


To Steven Samuel:

First, let me apologize myself for "interfering" into
your fascinating conversation. I wonder if it is your
opinion that the same concepts are prevalent when some
scientists do not take into account sampling weights
when analyzing cluster samples arising from household
survey with multilevel regression techniques, or it is
simply that techniques for dealing with sampling
weights are not available for multilevel regression?

TIA,

Moises Rosas



--- Steven Joel Hirsch Samuels
<sjhsamuels@earthlink.net> wrote:

To Steven Samuel
Forgive me for interferring your conversation with
Mr. Richard
Williams.
However I'm dealing with a dataset consisting of
10 subsamples with
information collected over a period of 7 years.

I was just wondering why you suggest to the ignore
the study
weights, especially if they were
post-stratified...?
Regards,
--
John Singhammer, Dr.phil, Mphil
Dept. of Public Health
Olof Palmes Allè 17
DK8200 Aarhus
Tel: +45 8728 4715
Mobile phone: +45 2530 5768


You are not interfering  This is a conversation open
to all. This is
a slightly expanded version of what I sent to you
privately.

How to treat the subpopulation and weights depends
on the purpose of
the study.  There is a Statalist thread which you
can look up. First,
note that the 'subpopulation' Richard's student
wants to study is not
a 'subsample'. I have sometimes taken 10 random
subsamples of a
single population to study variability between
samples.  This is the
method of 'interpenetrating replicated subsamples'
of Mahalanobis
which was popularized by WE Deming in the
1950's(Sample Design in
Business Research, Wiley, 1960).

To expand on the reason for ignoring the
subpopulation criterion.  If
Richard's student were to analyze the data as a
subpopulation, then
every sample mean have to be considered a ratio
estimate, effectively
analyzed with a 'ratio' procedure, which is what the
'subpop' option
in the survey commands does. This is because the
denominator in mean
= (sum of X variable)/(no. of people in the
subpopulation) would be
considered a random variable. At an extreme, the
very appearance of a
subpopulation is a random event and the appropriate
SE takes this
into account.  However it is likely that Richard's
student is
interested in the subpopulation as a way of studying
a question
unrelated to the original targt population--see
below.  In
theoretical terms, she may want to study
associations, conditional on
membership in the subpopulation.

To answer your question about weights.

1. If the purpose of a study is analytic (hypothesis
testing,
studying relations between variables) then Richard's
student may not
be really interested in the original target
population.  As an
example, she might never report the weighted counts;
she would report
the sample counts for crucial variables. The only
weights that I
would suggest, if any, are those which correct for
non-response and
unequal probability of selection.

2. It may be better to consider the study as an
'experimental
design', where population numbers of the
experimental groups are not
relevant.  In Survey Errors and Survey Costs by R.
Groves (Wiley
Books), Groves posts the example of a study of noise
in the vicinity
of an airport.  A study is to be done dividing the
area around the
airport into 'strata', which are zones at equal
distance from the
flight path or airport.  An equal sample size is
taken from each zone
and the goal is to study relation of noise to
distance. Of course
most people in the study area will not live in the
closest zones.  A
weighted analysis would give the closest people
their population
weight.  This would be okay if the main goal was
descriptive--to
estimate the 'average' noise experienced by
residents around the
airport.  However if you consider this an
experimental design, then
you want equal numbers at each dose, or, in fact,
more at the
extremes.  Thus you would not apply the population
weights.

You may think this is an extreme case, but I have
seen just this
analysis in a published study of the association of
gestational age
to birth weight.  Low birth weight infants were
oversampled--they are
only 5-10% of the population. Yet the analysts did
the weighted
analysis, which meant that the association in the
vicinity of low
birthweights was badly determined unless the model
was correct.

This is an ongoing debate among survey
statisticians, so you will get
different points of view.


On Nov 21, 2007, at 3:08 PM, John Singhammer wrote:


To Steven Samuel
Forgive me for interferring your conversation with
Mr. Richard
Williams.
However I'm dealing with a dataset consisting of
10 subsamples with
information collected over a period of 7 years.

I was just wondering why you suggest to the ignore
the study
weights, especially if they were
post-stratified...?
Regards,
--
John Singhammer, Dr.phil, Mphil
Dept. of Public Health
Olof Palmes Allè 17
DK8200 Aarhus
Tel: +45 8728 4715
Mobile phone: +45 2530 5768

Steven  Samuels

sjhsamuels@earthlink.net
18 Cantine's Island
Saugerties, NY 12477
Phone: 845-246-0774
EFax: 208-498-7441





*
*   For searches and help try:
*
http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Steven  Samuels

sjhsamuels@earthlink.net
18 Cantine's Island
Saugerties, NY 12477
Phone: 845-246-0774
EFax: 208-498-7441





*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index