Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: sample selection (-gsample) in stata

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: sample selection (-gsample) in stata
Date	Wed, 6 Jul 2011 22:05:18 -0500

Try this:

****CODE BEGINS****
set seed 95504066
sysuse auto, clear
tab foreign  // stratum variable
sample 12.5, by(foreign) //12.5% (1 in 8) sample in each stratum
tab foreign
gen fwt = 1/.125 //inverse of sampling probability
list fwt in 1
svyset _n [pweight=fwt], strata(foreign)
****CODE ENDS****


Steve
[email protected]


On Jul 6, 2011, at 6:44 PM, Shikha Sinha wrote:

Steve,

Thanks a lot for your valuable comments. You are right, I am confusing
PPS with proportional allocation. I will read the references advised
by you. Does Stata have a command to do draw a sample with
proportional allocation?

Thanks,
Shikha

On Wed, Jul 6, 2011 at 4:18 PM, Steven Samuels <[email protected]> wrote:
> 
> Shikka,  In excel, you have drawn a stratified sample with proportional allocation.  -gsample- has drawn a sample of 30 observations per stratum (as should be clear from the -help-).
> 
> You are using the term "PPS" without understanding what it means. I've already given you my best advice, so I don't think that I will add anything more.
> 
> Steve
> [email protected]
> 
> 
> 
> On Jul 6, 2011, at 5:32 PM, Shikha Sinha wrote:
> 
> Thanks everyone for the response. I think two-stage PPS is complex.
> However to understand the one-stage PPS in Stata, I still need your
> inputs. I did it in excel, and results are below:
> 
> City    No of companies Prob (no of companies/1397)     Number to be selected
> (300*prob)
> Central 135     0.10    29
> Copperbelt      184     0.13    40
> Eastern 173     0.12    37
> Luapula 136     0.10    29
> Lusaka  87      0.06    19
> North Western   130     0.09    28
> Northern        173     0.12    37
> Southern        231     0.17    50
> Western 148     0.11    32
> Total   1397    1       300
> 
> This is what I meant by PPS. From the sampling frame of 1397 companies
> in 9 cities, I want to draw a random sample of 300 comapnies based on
> PPS. Do you think I am doing it right in excel?
> 
> Next, I tried to generate the same in stata using -gsample.
> 
> bys  City: gen freq= _N
> 
> . g pps=freq/1397
> 
> . gsample 30 [aw=pps], wor strata( pid)
> (1127 observations deleted)
> 
> . tab  Province
> 
> City       Freq.     Percent    Cum.
> 
> Central          30       11.11 11.11
> Copperbelt          30       11.11      22.22
> Eastern          30       11.11 33.33
> Luapula          30       11.11 44.44
> Lusaka          30       11.11  55.56
> North Western          30       11.11   66.67
> Northern          30       11.11        77.78
> Southern          30       11.11        88.89
> Western          30       11.11 100.00
> 
> Total         270      100.00
> 
> The stata output is different from the excel output. -gsample draw 30
> obs from each City, then how can it be based on PPS. Could you suggest
> me the right code using -gsample to generate the excel output. or can
> I use -samplepps, what would be the code for this?
> 
> Thanks,
> 
> Shikha
> 
> 
> 
> 
> 
> 
> On Wed, Jul 6, 2011 at 12:30 PM, Stas Kolenikov <[email protected]> wrote:
>> On Tue, Jul 5, 2011 at 4:48 PM, Shikha Sinha <[email protected]> wrote:
>>> -gsample looks good, but I am still struggling. How do I calculate the
>>> size for -gsample. I want the select companies from each cities and of
>>> each type in each city.
>> 
>> -gsample- will only produce appropriate PPS samples if you specify
>> sampling with replacement (which is the approximation you would have
>> to make at the analysis stage, anyway). PPS sampling without
>> replacement is far more complicated, and if the phrase "Rao-Sampford
>> algorithm" does not ring a bell, you will end up with wrong sampling
>> weights.
>> 
>> --
>> Stas Kolenikov, also found at http://stas.kolenikov.name
>> Small print: I use this email account for mailing lists only.
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: sample selection (-gsample) in stata
  - From: Shikha Sinha <[email protected]>
- Re: st: sample selection (-gsample) in stata
  - From: Stas Kolenikov <[email protected]>
- Re: st: sample selection (-gsample) in stata
  - From: Shikha Sinha <[email protected]>
- Re: st: sample selection (-gsample) in stata
  - From: Steven Samuels <[email protected]>
- Re: st: sample selection (-gsample) in stata
  - From: Shikha Sinha <[email protected]>

Prev by Date: st: mvprobit within program define
Next by Date: st: Literature on an econometric modelling and Statalist Q&A
Previous by thread: Re: st: sample selection (-gsample) in stata
Next by thread: st: panel data-dropping all observations for the identifier when certain variable value is missing
Index(es):
- Date
- Thread