# Re: st: sample selection (-gsample) in stata

 From Steven Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: sample selection (-gsample) in stata Date Wed, 6 Jul 2011 13:30:17 -0500

" go with the simplest design that you do understand."

I meant "go with the best design, PPS or not, that you do understand."

Steve

Shikha Sinha:

Your design is confusing to me. You want to select 100 companies from 800 (plan unstated), stratified by city and company type; then you want to sample 25 companies PPS from these, but still select companies from each city and of each type in each city.

Because of the first-phase selection, the probabilities for the final 25 will not be proportional to size for the original population. (You must multiply the first- and second-phase probabilities.) So you will lose most of the value of PPS sampling.  To get 25 companies (where did this number come from?)  only a design that drew 1 or 2 companies PPS from each of the 20 strata would work. But this design would provide no degrees of freedom for error in most strata; and so you could not estimate standard errors.

The choice of design for a complex survey depends on the purpose of the study; the information available for each company; and what you can afford. If PPS sampling is a good choice (usually, but not always, the case), then use it at the first selection phase. Surveys of business establishments often use size measures such as sales, tax revenues, numbers of employees. Often the lists with this information are out-of-date, and you must make some guesses for companies not on the list. (A side issue with PPS designs is that  -gsample- and other algorithms will fail if some companies in a stratum are too "big".)

As you are not familiar with PPS designs, I suggest that you consult someone experienced in this area. Failing that, go with the simplest design that you do understand. Study of a sampling text might lead you to some good ideas, for example Groves et al. (2009) Survey Methodology, Wiley; Sharon Lohr, 2009, Sampling: Design & Analysis. The best PPS examples are in WE Deming (1960) Sample Design in Business Research, Wiley (available in paperback now). Deming's examples are of replicated samples and he uses selection in paper zones to achieve PPS sampling. For problems with limited degrees of freedom, see Korn, Edward Lee, and Barry I Graubard. 1999. Analysis of Health Surveys. New York: Wiley. But note: it can be very dangerous to copy someone else's design if you do not understand it.

Steve
On Jul 5, 2011, at 4:48 PM, Shikha Sinha wrote:

Thanks Jann.
-gsample looks good, but I am still struggling. How do I calculate the size for -gsample. I want the select companies from each cities and of each type in each city.

Thanks, S

On Tue, Jun 28, 2011 at 12:59 PM, Joerg Luedicke <joerg.luedicke@gmail.com> wrote: Maybe -gsample- from Ben Jann might be helpful (-findit gsample-).

J.

On Tue, Jun 28, 2011 at 3:43 PM, Shikha Sinha <shikha.sinha414@gmail.com> wrote: Dear all,

My question is about sample selection in stata. I want to select 100 companies loated in 10 cities from an universe of 800 comapnies. My data structure is as below:

I want to select a sample of 25 companies from these 100, using probablity weights of slection i.e. Probability proportional to size. Is there any comand in stata. I have tried -sample but of no avail.

