Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: sample selection (-gsample) in stata


From   Steven Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: sample selection (-gsample) in stata
Date   Wed, 6 Jul 2011 18:18:18 -0500

Shikka,  In excel, you have drawn a stratified sample with proportional allocation.  -gsample- has drawn a sample of 30 observations per stratum (as should be clear from the -help-).

You are using the term "PPS" without understanding what it means. I've already given you my best advice, so I don't think that I will add anything more.

Steve
sjsamuels@gmail.com



On Jul 6, 2011, at 5:32 PM, Shikha Sinha wrote:

Thanks everyone for the response. I think two-stage PPS is complex.
However to understand the one-stage PPS in Stata, I still need your
inputs. I did it in excel, and results are below:

City	No of companies	Prob (no of companies/1397)	Number to be selected
(300*prob)
Central	135	0.10	29
Copperbelt	184	0.13	40
Eastern	173	0.12	37
Luapula	136	0.10	29
Lusaka	87	0.06	19
North Western	130	0.09	28
Northern	173	0.12	37
Southern	231	0.17	50
Western	148	0.11	32
Total	1397	1	300

This is what I meant by PPS. From the sampling frame of 1397 companies
in 9 cities, I want to draw a random sample of 300 comapnies based on
PPS. Do you think I am doing it right in excel?

Next, I tried to generate the same in stata using -gsample.

bys  City: gen freq= _N

. g pps=freq/1397

. gsample 30 [aw=pps], wor strata( pid)
(1127 observations deleted)

. tab  Province

City       Freq.     Percent	Cum.
	
Central          30       11.11	11.11
Copperbelt          30       11.11	22.22
Eastern          30       11.11	33.33
Luapula          30       11.11	44.44
Lusaka          30       11.11	55.56
North Western          30       11.11	66.67
Northern          30       11.11	77.78
Southern          30       11.11	88.89
Western          30       11.11	100.00
	
Total         270      100.00

The stata output is different from the excel output. -gsample draw 30
obs from each City, then how can it be based on PPS. Could you suggest
me the right code using -gsample to generate the excel output. or can
I use -samplepps, what would be the code for this?

Thanks,

Shikha






On Wed, Jul 6, 2011 at 12:30 PM, Stas Kolenikov <skolenik@gmail.com> wrote:
> On Tue, Jul 5, 2011 at 4:48 PM, Shikha Sinha <shikha.sinha414@gmail.com> wrote:
>> -gsample looks good, but I am still struggling. How do I calculate the
>> size for -gsample. I want the select companies from each cities and of
>> each type in each city.
> 
> -gsample- will only produce appropriate PPS samples if you specify
> sampling with replacement (which is the approximation you would have
> to make at the analysis stage, anyway). PPS sampling without
> replacement is far more complicated, and if the phrase "Rao-Sampford
> algorithm" does not ring a bell, you will end up with wrong sampling
> weights.
> 
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: I use this email account for mailing lists only.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index