|
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Looking up values in a 2 dimensional table
On Apr 17, 2008, at 5:26 PM, Mike Lacy wrote:
A colleague of mine has a data file with family income and family
size for a large sample, with real family income for each of a
series of years, in the wide format, e.g., something like:
famid FamSize1985 FamSize1987 FamSize1991 ... Inc1985 inc1987
Inc1991 ...
with family sizes and family incomes recorded for about 20
different year, not at fixed intervals.
He needs to create a poverty status indicator corresponding to each
year, based on a 2 dimensional table giving poverty thresholds for
each value of year and family size. All the ways I can imagine
doing this seem relatively clumsy. (Among other things, I thought
about ways of doing this with a matrix but they would require using
the value of the family size variable as an index into the matrix,
which is beyond my ken.) I also considered something involving
reshape and merge, but that seemed awkward as well.
On Apr 17, 2008, at 7:58 PM, David Kantor wrote:
First, you probably mean Poverty Guideline.
Poverty Guideline is a simple, rough calculation based on family
size, income, and year.
Poverty Threshold is a more complex calculation that considers
number of children and number of elderly in addition to the other
factors. It is the "officially correct" measure, but is more
complicated to compute.
Those are U.S. government standards. (USDA, I believe.)
I have ado files to calculate both of these. But for the threshold,
it works for 1997 & 1998 only.
For the guideline, it does 1982-2007.
It sounds like David's program might be the easiest solution in this
case. However, to address the more general programming question you
raised, in terms of both efficiency and simplicity, the reshape ->
merge option wouldn't be a bad choice. Specifically, you would
1) reshape the data into long format so that your variables are
famid, year, famsize, and income
2) reshape the lookup table (also long) to yield the variables year,
famsize, and threshold
3) merge the table onto the data using the combination year and
famsize as the merge variables
4) calculate poverty status indicator as
gen byte poverty = (income < threshold)
5) reshape back to wide, if necessary
To implement a general function returning poverty status for a given
year and family size, you could load the table of thresholds into a
matrix, and then create two additional vectors (one for year and one
for family size) that would permit you to lookup the corresponding
threshold for a given year and family size. However, this would
require a bit of Mata programming, and wouldn't be any faster for the
task at hand (I don't believe).
-- Phil
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/