# st: RE: Systematic sampling in Stata

 From "Ronnie Babigumira" <[email protected]> To <[email protected]> Subject st: RE: Systematic sampling in Stata Date Fri, 16 May 2003 13:28:38 +0200

```Hi Cruces
I dont have much to say about systematic sampling however, I would like to
comment on the primary sampling unit. Ideally, this should be the unit of
observation, in household surveys, it is common to have the household as the
psu (in which case you would svyset the household_id as the psu). From your
email, I can also see that your weights are clearly specified so that
shouldnt be a problem. The next thing you are expected to specify is the
stata (this could be the areas from which the psus were selected,
alternatively, the strata could be representing variables of interest, (for
example in a study I was involved in, we considered population, market
access, and agricultural potential as key variables so we formed stratum as
diffrent combinations of these 3, from which we selected a number of
households). In your case, I see that you sampled on the basis of whether an
area had more that 100,000. This could be the basis for the stratum.

Hope this helps

Ronnie

-----Original Message-----
From: [email protected]
[mailto:[email protected]]On Behalf Of Cruces,GA
(pgr)
Sent: 15. mai 2003 20:04
To: [email protected]
Subject: st: Systematic sampling in Stata

Dear All,

I have a question about systematic sampling. I have read the Stata
manuals but I still cannot identify the PSUs and stratas in my data, and
I am not very sure about how to handle the systematic sampling. In broad
terms, I cannot match the textbook with my case...

Basically, I am using a census, in which the whole population for rural
areas and cities of less than 100.000 inhabitants were interviewed. In
cities with 100.000-500.000, I am told that 1 out of five "segments"
were systematically selected, and in cities of more than 500.000, 1 in
10 "segments" were selected. The "segments" are units of around forty
houses assigned to each each interviewer.

Finally, the weights are the inverse of the probability of being
included (so for rural areas and small cities they are just one). The
resulting sample (more than 16 million observations) is half the total
population.

I am unsure about how to deal with this within Stata, since for half of
my data I have the whole population whereas for the other I only have a
(complex) sample.

Any help will be appreciated.

Thank you all!

best,

g.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```