Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: GSAMPLE R3300

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: GSAMPLE R3300
Date	Fri, 7 Sep 2012 00:55:30 +0100

What I said is easily accessible to anyone who remains curious.

I still think it was a fair comment, but see no point in repeating or
rewriting what I said. Sam's welcome to his own interpretations and
speculations.

My amusement was entirely at Sam's wry summary of what I said, not at
all at his own positive contributions. I am sorry that I evidently did
not make that clear.

Nick

On Fri, Sep 7, 2012 at 12:20 AM, Lucas <[email protected]> wrote:
> A fuller summary of Nick's response to the gsample query is:
>
> 1)If the poster is making an illogical request, the program should not
> try to puzzle it out.
> 2)So, give the program weights it can use.
>
> I agree with 1, but 2 offered no guidance in what characteristics such
> weights might need to have, so I took a stab, and the tone I read in
> the email was generally dismissive ("Are you saying that you expected
> -gsample- . . .to cope with an illogical request?" . . . "If a user is
> asking something crazy" . . . "Alternatively, you can always write
> your own program that does what you want it to do.")
>
> I just assumed you, Nick, were in a bad mood, because immediately
> nearby was your one-word response to another poster's question.  Your
> answer was "No."  A later poster provided a bit more assistance.
>
> We all fall into bad moods occasionally.  When we do, it'd smooth
> social interaction if we don't claim amusement when others step
> forward to try to help others.
>
> Respectfully
> Sam
>
> On Thu, Sep 6, 2012 at 3:23 PM, Nick Cox <[email protected]> wrote:
>> I think Sam's last paragraph refers to my posting. For the record, I
>> consider his summary amusing, but not to represent what I said or even
>> meant. My main point was to underline that the program -gsample- was
>> behaving defensibly and that the poster's surprise was thus misplaced.
>> I did also suggest that they needed to recalculate the weights.
>> Naturally any other contributions to the thread that explain
>> specifically and correctly what the poster should do instead are more
>> valuable than that one post.
>>
>> Nick
>>
>> On Thu, Sep 6, 2012 at 11:04 PM, Lucas <[email protected]> wrote:
>>> I've never used gsample, but I just assumed after you removed the C
>>> units you could adjust the remaining cases so that their weights sum
>>> to 1.  Sorry I didn't say that.  Not sure this new information alters
>>> Steve's comment.
>>>
>>> My understanding of Stas's comment was that one left the certainty
>>> units in and let gsample select them while gsample also selected other
>>> cases, too.  My approach was to remove the certainty units and use
>>> gsample to select the remainder.  As I don't know what follows the
>>> sample selection, nor do I know gsample, I can't tell whether
>>> something is gained by letting gsample select the certainty units.
>>>
>>> At any rate, I took the ridiculous step of responding to a question
>>> about a command I have never used because I thought the poster
>>> deserved something more useful than an admonition to not be surprised
>>> if they try to get a command to do something it cannot do when it
>>> doesn't do it.
>>>
>>> Sam
>>>
>>> On Thu, Sep 6, 2012 at 2:17 PM, Steve Samuels <[email protected]> wrote:
>>>>
>>>>
>>>> Sam:
>>>>
>>>> You've missed the point of Stas's post: After removing the initial certainty
>>>> units, the scaled size measures (probabilities) of the remaining units must be
>>>> adjusted upward so that they sum to 1. Now additional units might violate the
>>>> inequality quoted in the -gsample- error message. The process is repeated
>>>> until the inequality is not violated for any the remaining units.
>>>>
>>>> Some alternatives to -gsample- and -samplepps- (SSC):
>>>> Sampford's Method can be found in the SAS SURVEYSELECT procedure. SAS's default
>>>> PPS method is the Hanurav-Vijayan method (Vijayan, 1968); see also Fox (1989)
>>>> and Golmant (1990). Tilley's elimination method can be found in the R "sampling"
>>>> package as the -UPTille- command.
>>>>
>>>> Tille (2006) is the definitive text these days.
>>>> See also the -help- for Ben Jann's -mf_mm_sample- for more information
>>>> (-gsample- is a wrapper for this).
>>>>
>>>> References:
>>>>
>>>> Fox, D. R. (1989), "Computer Selection of Size-Biased Samples," The American
>>>> Statistician, 43(3), 168–171.
>>>>
>>>> Golmant, J. (1990), "Correction: Computer Selection of Size-Biased Samples," The
>>>> American Statistician, 44(2), 194.
>>>>
>>>> Tillé, Yves. 2006. Sampling algorithms. New York: Springer.
>>>>
>>>> Vijayan, K. (1968), "An Exact Sampling Scheme: Generalization of a Method of
>>>> Hanurav," Journal of the Royal Statistical Society, Series B, 30, 556–566.
>>>>
>>>>
>>>> Steve
>>>>
>>>>
>>>> On Sep 6, 2012, at 1:04 PM, Lucas wrote:
>>>>
>>>> Why not simply remove the certainty units (C Units), draw the sample
>>>> from the remainder units (R Units) to obtain the sampled units (S
>>>> Units), then add the certainty and sampled sets (C & S) together to
>>>> form the final sample (FS units)?
>>>>
>>>> Sam
>>>>
>>>> On Thu, Sep 6, 2012 at 8:44 AM, Stas Kolenikov <[email protected]> wrote:
>>>> On Thu, Sep 6, 2012 at 9:28 AM, Lieke Boonen (SiRM)
>>>> <[email protected]> wrote:
>>>> We try to take a sample from our population, without replacement. we have several subgroeps with a high sampling weight. However with the gsample command it gives an error because for these cases the w_i*n /sum(w) is lager than 1. We thought the program looked at the relation between the weights and that this should not be a problem. Does anyone recognize this problem and is there a solution for this problem?
>>>>
>>>> As far as I can recall, -gsample- does a decent job of selecting one
>>>> observation from the list, provided, as you found the hard way, that
>>>> you don't have any certainty units. However, it is not appropriate for
>>>> many real situation sampling problems, which usually require more
>>>> complicated code. You also need to be aware that PPSWOR is a very
>>>> non-trivial and counter-intuitive task. See
>>>> http://www.citeulike.org/user/ctacmo/tag/unequal_prob_sampling for the
>>>> appropriate references. All in all, you probably need to do this:
>>>>
>>>> 1. Identify the certainty units, set their probability of selection to 1.
>>>> 2. Adjust the probability distribution, pulling up the probabilities
>>>> for other units.
>>>> 3. Check again for the certainty units: repeat steps 1-2 until the
>>>> probability of selection on a single draw have converged.
>>>> 4. Implement your PPS procedure -- systematic sample is the poor man,
>>>> old days shortcut procedure to sample from the physical list on
>>>> sheet(s) of paper that leads to technical difficulties in variance
>>>> estimation; Rao-Hartley-Cochran is the easiest-to-implement shortcut
>>>> that leads to an approximate PPS; Rao-Sampford used to be the most
>>>> rigorous choice until Tille's elimination procedures appeared in the
>>>> literature.
>>>>
>>>> --
>>>> -- Stas Kolenikov, PhD, PStat (SSC)  ::  http://stas.kolenikov.name
>>>> -- Senior Survey Statistician, Abt SRBI  ::  work email kolenikovs at
>>>> srbi dot com
>>>> -- Opinions stated in this email are mine only, and do not reflect the
>>>> position of my employer
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: GSAMPLE R3300
  - From: Duru <[email protected]>

References:
- st: GSAMPLE R3300
  - From: "Lieke Boonen (SiRM)" <[email protected]>
- Re: st: GSAMPLE R3300
  - From: Stas Kolenikov <[email protected]>
- Re: st: GSAMPLE R3300
  - From: Lucas <[email protected]>
- Re: st: GSAMPLE R3300
  - From: Steve Samuels <[email protected]>
- Re: st: GSAMPLE R3300
  - From: Lucas <[email protected]>
- Re: st: GSAMPLE R3300
  - From: Nick Cox <[email protected]>
- Re: st: GSAMPLE R3300
  - From: Lucas <[email protected]>

Prev by Date: Re: st: Hausman instruments - transfer varying variable of one variable as column next to other variables
Next by Date: RE: st: polychoric for huge data sets
Previous by thread: Re: st: GSAMPLE R3300
Next by thread: Re: st: GSAMPLE R3300
Index(es):
- Date
- Thread