Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: sample selection (-gsample) in stata |

Date |
Wed, 6 Jul 2011 22:05:18 -0500 |

Try this: ****CODE BEGINS**** set seed 95504066 sysuse auto, clear tab foreign // stratum variable sample 12.5, by(foreign) //12.5% (1 in 8) sample in each stratum tab foreign gen fwt = 1/.125 //inverse of sampling probability list fwt in 1 svyset _n [pweight=fwt], strata(foreign) ****CODE ENDS**** Steve sjsamuels@gmail.com On Jul 6, 2011, at 6:44 PM, Shikha Sinha wrote: Steve, Thanks a lot for your valuable comments. You are right, I am confusing PPS with proportional allocation. I will read the references advised by you. Does Stata have a command to do draw a sample with proportional allocation? Thanks, Shikha On Wed, Jul 6, 2011 at 4:18 PM, Steven Samuels <sjsamuels@gmail.com> wrote: > > Shikka, In excel, you have drawn a stratified sample with proportional allocation. -gsample- has drawn a sample of 30 observations per stratum (as should be clear from the -help-). > > You are using the term "PPS" without understanding what it means. I've already given you my best advice, so I don't think that I will add anything more. > > Steve > sjsamuels@gmail.com > > > > On Jul 6, 2011, at 5:32 PM, Shikha Sinha wrote: > > Thanks everyone for the response. I think two-stage PPS is complex. > However to understand the one-stage PPS in Stata, I still need your > inputs. I did it in excel, and results are below: > > City No of companies Prob (no of companies/1397) Number to be selected > (300*prob) > Central 135 0.10 29 > Copperbelt 184 0.13 40 > Eastern 173 0.12 37 > Luapula 136 0.10 29 > Lusaka 87 0.06 19 > North Western 130 0.09 28 > Northern 173 0.12 37 > Southern 231 0.17 50 > Western 148 0.11 32 > Total 1397 1 300 > > This is what I meant by PPS. From the sampling frame of 1397 companies > in 9 cities, I want to draw a random sample of 300 comapnies based on > PPS. Do you think I am doing it right in excel? > > Next, I tried to generate the same in stata using -gsample. > > bys City: gen freq= _N > > . g pps=freq/1397 > > . gsample 30 [aw=pps], wor strata( pid) > (1127 observations deleted) > > . tab Province > > City Freq. Percent Cum. > > Central 30 11.11 11.11 > Copperbelt 30 11.11 22.22 > Eastern 30 11.11 33.33 > Luapula 30 11.11 44.44 > Lusaka 30 11.11 55.56 > North Western 30 11.11 66.67 > Northern 30 11.11 77.78 > Southern 30 11.11 88.89 > Western 30 11.11 100.00 > > Total 270 100.00 > > The stata output is different from the excel output. -gsample draw 30 > obs from each City, then how can it be based on PPS. Could you suggest > me the right code using -gsample to generate the excel output. or can > I use -samplepps, what would be the code for this? > > Thanks, > > Shikha > > > > > > > On Wed, Jul 6, 2011 at 12:30 PM, Stas Kolenikov <skolenik@gmail.com> wrote: >> On Tue, Jul 5, 2011 at 4:48 PM, Shikha Sinha <shikha.sinha414@gmail.com> wrote: >>> -gsample looks good, but I am still struggling. How do I calculate the >>> size for -gsample. I want the select companies from each cities and of >>> each type in each city. >> >> -gsample- will only produce appropriate PPS samples if you specify >> sampling with replacement (which is the approximation you would have >> to make at the analysis stage, anyway). PPS sampling without >> replacement is far more complicated, and if the phrase "Rao-Sampford >> algorithm" does not ring a bell, you will end up with wrong sampling >> weights. >> >> -- >> Stas Kolenikov, also found at http://stas.kolenikov.name >> Small print: I use this email account for mailing lists only. >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: sample selection (-gsample) in stata***From:*Shikha Sinha <shikha.sinha414@gmail.com>

**Re: st: sample selection (-gsample) in stata***From:*Stas Kolenikov <skolenik@gmail.com>

**Re: st: sample selection (-gsample) in stata***From:*Shikha Sinha <shikha.sinha414@gmail.com>

**Re: st: sample selection (-gsample) in stata***From:*Steven Samuels <sjsamuels@gmail.com>

**Re: st: sample selection (-gsample) in stata***From:*Shikha Sinha <shikha.sinha414@gmail.com>

- Prev by Date:
**st: mvprobit within program define** - Next by Date:
**st: Literature on an econometric modelling and Statalist Q&A** - Previous by thread:
**Re: st: sample selection (-gsample) in stata** - Next by thread:
**st: panel data-dropping all observations for the identifier when certain variable value is missing** - Index(es):