Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Selecting a sample to compromise between significant size and geographical dispersion

From   Stas Kolenikov <>
Subject   Re: st: Selecting a sample to compromise between significant size and geographical dispersion
Date   Fri, 16 Sep 2011 09:48:26 -0500

On Thu, Sep 15, 2011 at 2:25 AM, Partho Sarkar
<> wrote:
> Broadly speaking, I want to select a sample from a very large
> population to achieve a "good" compromise between excluding
> "insignificant" units, and ensuring "reasonable" diversity.  I have a
> hierarchical dataset on prices of some commodities from markets across
> the country.  (The geographical levels being:
> national-state-district-market.  Markets are the primary units). I
> want to consider the prices only from "significant" markets, i.e., for
> each commodity, markets which have trading volumes at least equal to
> the median volume (say).  BUT, I also want to ensure as complete a
> geographical coverage as possible.

I second Maarten's request for you to define the population. If you
don't have the population defined, you don't know what your sample is
good for. It appears though that your understanding of sampling is
that you define the cut-off and keep every market above it. That's a
very silly design: this is a census of a specific subpopulation, and
it may not inform you about the whole population.

If you have access to complete data and want to construct an index,
then this is a modeling question, not a sampling question. With a
sampling procedure, you would first assign probabilities of selection
to all units in the population and set a procedure in place that would
match an outcome of a random number generator with a unit in the
population (or a group of units, should the need be). In your case,
you might have wanted to entertain probability proportional to size
sampling. However, if you are really looking for a meaningful way to
construct an index, you probably would not want to throw away the
data, but rather weight the different markets according to their size,
so that your price is the total of all sales divided by the number of
sales made, rather than an arithmetic average of the prices across
markets of different sizes, and may be smooth them over time with a
moving average process if needed.

Frankly, I would find it a poor practice to go for methodological
advice to a free community like statalist, and then charge your
clients for the advice you would give them based on our input.

Stas Kolenikov, also found at
Small print: I use this email account for mailing lists only.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index