Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Selecting a sample to compromise between significant size and geographical dispersion


From   Partho Sarkar <partho.ss+lists@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Selecting a sample to compromise between significant size and geographical dispersion
Date   Thu, 15 Sep 2011 12:55:30 +0530

I wonder if any Statalister has some ideas/insight to share on the
following "fuzzy" problem.

Broadly speaking, I want to select a sample from a very large
population to achieve a "good" compromise between excluding
"insignificant" units, and ensuring "reasonable" diversity.  I have a
hierarchical dataset on prices of some commodities from markets across
the country.  (The geographical levels being:
national-state-district-market.  Markets are the primary units). I
want to consider the prices only from "significant" markets, i.e., for
each commodity, markets which have trading volumes at least equal to
the median volume (say).  BUT, I also want to ensure as complete a
geographical coverage as possible.

Ideally, I would have a set of parameters to control the
"significance" (as defined above) and the "dispersion" (geographical)
of markets for each commodity, and a method to optimally select the
"best" parameters.  E.g., if I were to try to do this manually, I
might first set the median trading volume as a cut-off, this would
result in a certain selection of markets, with an associated
geographical pattern. (What could be a meaningful way to measure the
degree of dispersion?) If on inspection I found that the cut-off
resulted in "too much" geographical concentration, I would reduce the
cut-off, and so iterate till I got a "good" compromise.

I imagine this sort of consideration comes up fairly commonly in some
areas, and there might be established methods/programs to handle this,
whether in Stata or otherwise (I am familiar with Matlab & R).  Any
ideas?

Thanks & Regards
Partha S. Sarkar
Consultant Econometrician
Indicus Analytics Pvt. Ltd (www.indicus.net)
New Delhi, India
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index