Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: GSAMPLE R3300 |

Date |
Fri, 7 Sep 2012 00:55:30 +0100 |

What I said is easily accessible to anyone who remains curious. I still think it was a fair comment, but see no point in repeating or rewriting what I said. Sam's welcome to his own interpretations and speculations. My amusement was entirely at Sam's wry summary of what I said, not at all at his own positive contributions. I am sorry that I evidently did not make that clear. Nick On Fri, Sep 7, 2012 at 12:20 AM, Lucas <lucaselastic@gmail.com> wrote: > A fuller summary of Nick's response to the gsample query is: > > 1)If the poster is making an illogical request, the program should not > try to puzzle it out. > 2)So, give the program weights it can use. > > I agree with 1, but 2 offered no guidance in what characteristics such > weights might need to have, so I took a stab, and the tone I read in > the email was generally dismissive ("Are you saying that you expected > -gsample- . . .to cope with an illogical request?" . . . "If a user is > asking something crazy" . . . "Alternatively, you can always write > your own program that does what you want it to do.") > > I just assumed you, Nick, were in a bad mood, because immediately > nearby was your one-word response to another poster's question. Your > answer was "No." A later poster provided a bit more assistance. > > We all fall into bad moods occasionally. When we do, it'd smooth > social interaction if we don't claim amusement when others step > forward to try to help others. > > Respectfully > Sam > > On Thu, Sep 6, 2012 at 3:23 PM, Nick Cox <njcoxstata@gmail.com> wrote: >> I think Sam's last paragraph refers to my posting. For the record, I >> consider his summary amusing, but not to represent what I said or even >> meant. My main point was to underline that the program -gsample- was >> behaving defensibly and that the poster's surprise was thus misplaced. >> I did also suggest that they needed to recalculate the weights. >> Naturally any other contributions to the thread that explain >> specifically and correctly what the poster should do instead are more >> valuable than that one post. >> >> Nick >> >> On Thu, Sep 6, 2012 at 11:04 PM, Lucas <lucaselastic@gmail.com> wrote: >>> I've never used gsample, but I just assumed after you removed the C >>> units you could adjust the remaining cases so that their weights sum >>> to 1. Sorry I didn't say that. Not sure this new information alters >>> Steve's comment. >>> >>> My understanding of Stas's comment was that one left the certainty >>> units in and let gsample select them while gsample also selected other >>> cases, too. My approach was to remove the certainty units and use >>> gsample to select the remainder. As I don't know what follows the >>> sample selection, nor do I know gsample, I can't tell whether >>> something is gained by letting gsample select the certainty units. >>> >>> At any rate, I took the ridiculous step of responding to a question >>> about a command I have never used because I thought the poster >>> deserved something more useful than an admonition to not be surprised >>> if they try to get a command to do something it cannot do when it >>> doesn't do it. >>> >>> Sam >>> >>> On Thu, Sep 6, 2012 at 2:17 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >>>> >>>> >>>> Sam: >>>> >>>> You've missed the point of Stas's post: After removing the initial certainty >>>> units, the scaled size measures (probabilities) of the remaining units must be >>>> adjusted upward so that they sum to 1. Now additional units might violate the >>>> inequality quoted in the -gsample- error message. The process is repeated >>>> until the inequality is not violated for any the remaining units. >>>> >>>> Some alternatives to -gsample- and -samplepps- (SSC): >>>> Sampford's Method can be found in the SAS SURVEYSELECT procedure. SAS's default >>>> PPS method is the Hanurav-Vijayan method (Vijayan, 1968); see also Fox (1989) >>>> and Golmant (1990). Tilley's elimination method can be found in the R "sampling" >>>> package as the -UPTille- command. >>>> >>>> Tille (2006) is the definitive text these days. >>>> See also the -help- for Ben Jann's -mf_mm_sample- for more information >>>> (-gsample- is a wrapper for this). >>>> >>>> References: >>>> >>>> Fox, D. R. (1989), "Computer Selection of Size-Biased Samples," The American >>>> Statistician, 43(3), 168–171. >>>> >>>> Golmant, J. (1990), "Correction: Computer Selection of Size-Biased Samples," The >>>> American Statistician, 44(2), 194. >>>> >>>> Tillé, Yves. 2006. Sampling algorithms. New York: Springer. >>>> >>>> Vijayan, K. (1968), "An Exact Sampling Scheme: Generalization of a Method of >>>> Hanurav," Journal of the Royal Statistical Society, Series B, 30, 556–566. >>>> >>>> >>>> Steve >>>> >>>> >>>> On Sep 6, 2012, at 1:04 PM, Lucas wrote: >>>> >>>> Why not simply remove the certainty units (C Units), draw the sample >>>> from the remainder units (R Units) to obtain the sampled units (S >>>> Units), then add the certainty and sampled sets (C & S) together to >>>> form the final sample (FS units)? >>>> >>>> Sam >>>> >>>> On Thu, Sep 6, 2012 at 8:44 AM, Stas Kolenikov <skolenik@gmail.com> wrote: >>>> On Thu, Sep 6, 2012 at 9:28 AM, Lieke Boonen (SiRM) >>>> <Lieke.Boonen@sirm.nl> wrote: >>>> We try to take a sample from our population, without replacement. we have several subgroeps with a high sampling weight. However with the gsample command it gives an error because for these cases the w_i*n /sum(w) is lager than 1. We thought the program looked at the relation between the weights and that this should not be a problem. Does anyone recognize this problem and is there a solution for this problem? >>>> >>>> As far as I can recall, -gsample- does a decent job of selecting one >>>> observation from the list, provided, as you found the hard way, that >>>> you don't have any certainty units. However, it is not appropriate for >>>> many real situation sampling problems, which usually require more >>>> complicated code. You also need to be aware that PPSWOR is a very >>>> non-trivial and counter-intuitive task. See >>>> http://www.citeulike.org/user/ctacmo/tag/unequal_prob_sampling for the >>>> appropriate references. All in all, you probably need to do this: >>>> >>>> 1. Identify the certainty units, set their probability of selection to 1. >>>> 2. Adjust the probability distribution, pulling up the probabilities >>>> for other units. >>>> 3. Check again for the certainty units: repeat steps 1-2 until the >>>> probability of selection on a single draw have converged. >>>> 4. Implement your PPS procedure -- systematic sample is the poor man, >>>> old days shortcut procedure to sample from the physical list on >>>> sheet(s) of paper that leads to technical difficulties in variance >>>> estimation; Rao-Hartley-Cochran is the easiest-to-implement shortcut >>>> that leads to an approximate PPS; Rao-Sampford used to be the most >>>> rigorous choice until Tille's elimination procedures appeared in the >>>> literature. >>>> >>>> -- >>>> -- Stas Kolenikov, PhD, PStat (SSC) :: http://stas.kolenikov.name >>>> -- Senior Survey Statistician, Abt SRBI :: work email kolenikovs at >>>> srbi dot com >>>> -- Opinions stated in this email are mine only, and do not reflect the >>>> position of my employer >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>>> >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: GSAMPLE R3300***From:*Duru <duru80@gmail.com>

**References**:**st: GSAMPLE R3300***From:*"Lieke Boonen (SiRM)" <Lieke.Boonen@sirm.nl>

**Re: st: GSAMPLE R3300***From:*Stas Kolenikov <skolenik@gmail.com>

**Re: st: GSAMPLE R3300***From:*Lucas <lucaselastic@gmail.com>

**Re: st: GSAMPLE R3300***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st: GSAMPLE R3300***From:*Lucas <lucaselastic@gmail.com>

**Re: st: GSAMPLE R3300***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: GSAMPLE R3300***From:*Lucas <lucaselastic@gmail.com>

- Prev by Date:
**Re: st: Hausman instruments - transfer varying variable of one variable as column next to other variables** - Next by Date:
**RE: st: polychoric for huge data sets** - Previous by thread:
**Re: st: GSAMPLE R3300** - Next by thread:
**Re: st: GSAMPLE R3300** - Index(es):