Re: st: sample selection (-gsample) in stata

 From Shikha Sinha To statalist@hsphsun2.harvard.edu Subject Re: st: sample selection (-gsample) in stata Date Wed, 6 Jul 2011 15:32:47 -0700

```Thanks everyone for the response. I think two-stage PPS is complex.
However to understand the one-stage PPS in Stata, I still need your
inputs. I did it in excel, and results are below:

City	No of companies	Prob (no of companies/1397)	Number to be selected
(300*prob)
Central	135	0.10	29
Copperbelt	184	0.13	40
Eastern	173	0.12	37
Luapula	136	0.10	29
Lusaka	87	0.06	19
North Western	130	0.09	28
Northern	173	0.12	37
Southern	231	0.17	50
Western	148	0.11	32
Total	1397	1	300

This is what I meant by PPS. From the sampling frame of 1397 companies
in 9 cities, I want to draw a random sample of 300 comapnies based on
PPS. Do you think I am doing it right in excel?

Next, I tried to generate the same in stata using -gsample.

bys  City: gen freq= _N

. g pps=freq/1397

. gsample 30 [aw=pps], wor strata( pid)
(1127 observations deleted)

. tab  Province

City       Freq.     Percent	Cum.

Central          30       11.11	11.11
Copperbelt          30       11.11	22.22
Eastern          30       11.11	33.33
Luapula          30       11.11	44.44
Lusaka          30       11.11	55.56
North Western          30       11.11	66.67
Northern          30       11.11	77.78
Southern          30       11.11	88.89
Western          30       11.11	100.00

Total         270      100.00

The stata output is different from the excel output. -gsample draw 30
obs from each City, then how can it be based on PPS. Could you suggest
me the right code using -gsample to generate the excel output. or can
I use -samplepps, what would be the code for this?

Thanks,

Shikha

On Wed, Jul 6, 2011 at 12:30 PM, Stas Kolenikov <skolenik@gmail.com> wrote:
> On Tue, Jul 5, 2011 at 4:48 PM, Shikha Sinha <shikha.sinha414@gmail.com> wrote:
>> -gsample looks good, but I am still struggling. How do I calculate the
>> size for -gsample. I want the select companies from each cities and of
>> each type in each city.
>
> -gsample- will only produce appropriate PPS samples if you specify
> sampling with replacement (which is the approximation you would have
> to make at the analysis stage, anyway). PPS sampling without
> replacement is far more complicated, and if the phrase "Rao-Sampford
> algorithm" does not ring a bell, you will end up with wrong sampling
> weights.
>
>

```