Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Looking up values in a 2 dimensional table


From   Phil Schumm <pschumm@uchicago.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Looking up values in a 2 dimensional table
Date   Thu, 17 Apr 2008 20:30:39 -0500

On Apr 17, 2008, at 5:26 PM, Mike Lacy wrote:
A colleague of mine has a data file with family income and family size for a large sample, with real family income for each of a series of years, in the wide format, e.g., something like:

famid FamSize1985 FamSize1987 FamSize1991 ... Inc1985 inc1987 Inc1991 ...

with family sizes and family incomes recorded for about 20 different year, not at fixed intervals.

He needs to create a poverty status indicator corresponding to each year, based on a 2 dimensional table giving poverty thresholds for each value of year and family size. All the ways I can imagine doing this seem relatively clumsy. (Among other things, I thought about ways of doing this with a matrix but they would require using the value of the family size variable as an index into the matrix, which is beyond my ken.) I also considered something involving reshape and merge, but that seemed awkward as well.

On Apr 17, 2008, at 7:58 PM, David Kantor wrote:
First, you probably mean Poverty Guideline.
Poverty Guideline is a simple, rough calculation based on family size, income, and year.
Poverty Threshold is a more complex calculation that considers number of children and number of elderly in addition to the other factors. It is the "officially correct" measure, but is more complicated to compute.

Those are U.S. government standards. (USDA, I believe.)

I have ado files to calculate both of these. But for the threshold, it works for 1997 & 1998 only.
For the guideline, it does 1982-2007.

It sounds like David's program might be the easiest solution in this case. However, to address the more general programming question you raised, in terms of both efficiency and simplicity, the reshape -> merge option wouldn't be a bad choice. Specifically, you would

1) reshape the data into long format so that your variables are famid, year, famsize, and income

2) reshape the lookup table (also long) to yield the variables year, famsize, and threshold

3) merge the table onto the data using the combination year and famsize as the merge variables

4) calculate poverty status indicator as

gen byte poverty = (income < threshold)

5) reshape back to wide, if necessary

To implement a general function returning poverty status for a given year and family size, you could load the table of thresholds into a matrix, and then create two additional vectors (one for year and one for family size) that would permit you to lookup the corresponding threshold for a given year and family size. However, this would require a bit of Mata programming, and wouldn't be any faster for the task at hand (I don't believe).


-- Phil

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index