Morris, Stephen > > I have individual-level data on grouped income, where an > individual's income > may be in one of 31 bands. The bands get wider at higher > incomes (the > highest being open-ended). I know the cut points for each > band and the > number of observations/individuals within each band. I > would like to compute > a continuous individual-level income variable from this. > The most common > method seems to be to take the midpoint of the band and > assign this value to > all individuals in the band, making some reasonably > arbitrary assumption as > to the value to be used for the highest (unbounded) group. > It seems to me > this is very simplistic. I use Stata 7 and in the past I > have used interval > regression to regress banded income on a number of > explanatory variables and > then used the predictions from this model as a measure of > predicted income. > Invariably this methods leads to some observations having a > predicted income > value outside the original band, which is not very satisfactory. > > I have been looking at the possibility of using some kind > of kernel density > estimation, setting the widths equal to the income bands. > However, as far as > I can see, in Stata 7's hardwired -kdensity- command it is > not possible to > set different widths across the distribution (my income > bands are not equal > widths). Also, I am not sure how then to apply the > information on the > resulting density estimates to generate values for > observations within each > band. > > Any thoughts on how to proceed are much appreciated. I > guess this must be a > common problem and applies to a whole range of variables, > not just income. > Apologies in advance if I have missed something obvious. > I am not clear on how far you are trying to impute individual incomes as compared with seeking a smoother collective view of the total distribution. Stephen Jenkins (and indeed others) may have much more to say on this, but a fairly obvious first approximation may be to work on a log scale and then exponentiate. On a variety of empirical and quasitheoretical grounds lognormality is often invoked for incomes; the idea that many of our incomes typically change by some (small) percent per year is naturally one of them. The user-written -mdensity- on SSC has some handles here. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

