Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Turning banded income into a continuous variable

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Turning banded income into a continuous variable
Date   Wed, 2 Apr 2003 14:35:44 +0100

Morris, Stephen
> I have individual-level data on grouped income, where an 
> individual's income
> may be in one of 31 bands. The bands get wider at higher 
> incomes (the
> highest being open-ended). I know the cut points for each 
> band and the
> number of observations/individuals within each band. I 
> would like to compute
> a continuous individual-level income variable from this. 
> The most common
> method seems to be to take the midpoint of the band and 
> assign this value to
> all individuals in the band, making some reasonably 
> arbitrary assumption as
> to the value to be used for the highest (unbounded) group. 
> It seems to me
> this is very simplistic. I use Stata 7 and in the past I 
> have used interval
> regression to regress banded income on a number of 
> explanatory variables and
> then used the predictions from this model as a measure of 
> predicted income.
> Invariably this methods leads to some observations having a 
> predicted income
> value outside the original band, which is not very satisfactory.
> I have been looking at the possibility of using some kind 
> of kernel density
> estimation, setting the widths equal to the income bands. 
> However, as far as
> I can see, in Stata 7's hardwired -kdensity- command it is 
> not possible to
> set different widths across the distribution (my income 
> bands are not equal
> widths). Also, I am not sure how then to apply the 
> information on the
> resulting density estimates to generate values for 
> observations within each
> band.
> Any thoughts on how to proceed are much appreciated. I 
> guess this must be a
> common problem and applies to a whole range of variables, 
> not just income.
> Apologies in advance if I have missed something obvious.

I am not clear on how far you are trying to impute 
individual incomes as compared with seeking a smoother
collective view of the total distribution. 

Stephen Jenkins (and indeed others) may have much 
more to say on this, but a fairly obvious first 
approximation may be to work on a log scale 
and then exponentiate. On a variety of empirical 
and quasitheoretical grounds lognormality is 
often invoked for incomes; the idea that 
many of our incomes typically change by some (small) 
percent per year is naturally one of them. 

The user-written -mdensity- on SSC has some handles 

[email protected] 

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index