Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Stas Kolenikov <skolenik@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Selecting a sample to compromise between significant size and geographical dispersion |
Date | Fri, 16 Sep 2011 09:48:26 -0500 |
On Thu, Sep 15, 2011 at 2:25 AM, Partho Sarkar <partho.ss+lists@gmail.com> wrote: > Broadly speaking, I want to select a sample from a very large > population to achieve a "good" compromise between excluding > "insignificant" units, and ensuring "reasonable" diversity. I have a > hierarchical dataset on prices of some commodities from markets across > the country. (The geographical levels being: > national-state-district-market. Markets are the primary units). I > want to consider the prices only from "significant" markets, i.e., for > each commodity, markets which have trading volumes at least equal to > the median volume (say). BUT, I also want to ensure as complete a > geographical coverage as possible. I second Maarten's request for you to define the population. If you don't have the population defined, you don't know what your sample is good for. It appears though that your understanding of sampling is that you define the cut-off and keep every market above it. That's a very silly design: this is a census of a specific subpopulation, and it may not inform you about the whole population. If you have access to complete data and want to construct an index, then this is a modeling question, not a sampling question. With a sampling procedure, you would first assign probabilities of selection to all units in the population and set a procedure in place that would match an outcome of a random number generator with a unit in the population (or a group of units, should the need be). In your case, you might have wanted to entertain probability proportional to size sampling. However, if you are really looking for a meaningful way to construct an index, you probably would not want to throw away the data, but rather weight the different markets according to their size, so that your price is the total of all sales divided by the number of sales made, rather than an arithmetic average of the prices across markets of different sizes, and may be smooth them over time with a moving average process if needed. Frankly, I would find it a poor practice to go for methodological advice to a free community like statalist, and then charge your clients for the advice you would give them based on our input. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/