Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: GSAMPLE R3300

From	Steve Samuels <[email protected]>
To	[email protected]
Subject	Re: st: GSAMPLE R3300
Date	Thu, 6 Sep 2012 17:17:28 -0400

Sam:

You've missed the point of Stas's post: After removing the initial certainty
units, the scaled size measures (probabilities) of the remaining units must be
adjusted upward so that they sum to 1. Now additional units might violate the
inequality quoted in the -gsample- error message. The process is repeated
until the inequality is not violated for any the remaining units.

Some alternatives to -gsample- and -samplepps- (SSC):
Sampford's Method can be found in the SAS SURVEYSELECT procedure. SAS's default
PPS method is the Hanurav-Vijayan method (Vijayan, 1968); see also Fox (1989)
and Golmant (1990). Tilley's elimination method can be found in the R "sampling"
package as the -UPTille- command.

Tille (2006) is the definitive text these days.
See also the -help- for Ben Jann's -mf_mm_sample- for more information
(-gsample- is a wrapper for this).

References:

Fox, D. R. (1989), "Computer Selection of Size-Biased Samples," The American
Statistician, 43(3), 168–171.

Golmant, J. (1990), "Correction: Computer Selection of Size-Biased Samples," The
American Statistician, 44(2), 194.

Tillé, Yves. 2006. Sampling algorithms. New York: Springer.

Vijayan, K. (1968), "An Exact Sampling Scheme: Generalization of a Method of
Hanurav," Journal of the Royal Statistical Society, Series B, 30, 556–566.

Steve

On Sep 6, 2012, at 1:04 PM, Lucas wrote:

Why not simply remove the certainty units (C Units), draw the sample
from the remainder units (R Units) to obtain the sampled units (S
Units), then add the certainty and sampled sets (C & S) together to
form the final sample (FS units)?

Sam

On Thu, Sep 6, 2012 at 8:44 AM, Stas Kolenikov <[email protected]> wrote:
On Thu, Sep 6, 2012 at 9:28 AM, Lieke Boonen (SiRM)
<[email protected]> wrote:
We try to take a sample from our population, without replacement. we have several subgroeps with a high sampling weight. However with the gsample command it gives an error because for these cases the w_i*n /sum(w) is lager than 1. We thought the program looked at the relation between the weights and that this should not be a problem. Does anyone recognize this problem and is there a solution for this problem?

As far as I can recall, -gsample- does a decent job of selecting one
observation from the list, provided, as you found the hard way, that
you don't have any certainty units. However, it is not appropriate for
many real situation sampling problems, which usually require more
complicated code. You also need to be aware that PPSWOR is a very
non-trivial and counter-intuitive task. See
http://www.citeulike.org/user/ctacmo/tag/unequal_prob_sampling for the
appropriate references. All in all, you probably need to do this:

1. Identify the certainty units, set their probability of selection to 1.
2. Adjust the probability distribution, pulling up the probabilities
for other units.
3. Check again for the certainty units: repeat steps 1-2 until the
probability of selection on a single draw have converged.
4. Implement your PPS procedure -- systematic sample is the poor man,
old days shortcut procedure to sample from the physical list on
sheet(s) of paper that leads to technical difficulties in variance
estimation; Rao-Hartley-Cochran is the easiest-to-implement shortcut
that leads to an approximate PPS; Rao-Sampford used to be the most
rigorous choice until Tille's elimination procedures appeared in the
literature.

--
-- Stas Kolenikov, PhD, PStat (SSC) :: http://stas.kolenikov.name
-- Senior Survey Statistician, Abt SRBI :: work email kolenikovs at
srbi dot com
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: GSAMPLE R3300
  - From: Lucas <[email protected]>

References:
- st: GSAMPLE R3300
  - From: "Lieke Boonen (SiRM)" <[email protected]>
- Re: st: GSAMPLE R3300
  - From: Stas Kolenikov <[email protected]>
- Re: st: GSAMPLE R3300
  - From: Lucas <[email protected]>

Prev by Date: st: Hausman instruments - transfer varying variable of one variable as column next to other variables
Next by Date: RE: st: RE: Hausman-Taylor AR(1) estimator
Previous by thread: Re: st: GSAMPLE R3300
Next by thread: Re: st: GSAMPLE R3300
Index(es):
- Date
- Thread