# st: RE: Turning banded income into a continuous variable

 From "Nick Cox" <[email protected]> To <[email protected]> Subject st: RE: Turning banded income into a continuous variable Date Wed, 2 Apr 2003 14:35:44 +0100

```Morris, Stephen
>
> I have individual-level data on grouped income, where an
> individual's income
> may be in one of 31 bands. The bands get wider at higher
> incomes (the
> highest being open-ended). I know the cut points for each
> band and the
> number of observations/individuals within each band. I
> would like to compute
> a continuous individual-level income variable from this.
> The most common
> method seems to be to take the midpoint of the band and
> assign this value to
> all individuals in the band, making some reasonably
> arbitrary assumption as
> to the value to be used for the highest (unbounded) group.
> It seems to me
> this is very simplistic. I use Stata 7 and in the past I
> have used interval
> regression to regress banded income on a number of
> explanatory variables and
> then used the predictions from this model as a measure of
> predicted income.
> Invariably this methods leads to some observations having a
> predicted income
> value outside the original band, which is not very satisfactory.
>
> I have been looking at the possibility of using some kind
> of kernel density
> estimation, setting the widths equal to the income bands.
> However, as far as
> I can see, in Stata 7's hardwired -kdensity- command it is
> not possible to
> set different widths across the distribution (my income
> bands are not equal
> widths). Also, I am not sure how then to apply the
> information on the
> resulting density estimates to generate values for
> observations within each
> band.
>
> Any thoughts on how to proceed are much appreciated. I
> guess this must be a
> common problem and applies to a whole range of variables,
> not just income.
> Apologies in advance if I have missed something obvious.
>

I am not clear on how far you are trying to impute
individual incomes as compared with seeking a smoother
collective view of the total distribution.

Stephen Jenkins (and indeed others) may have much
more to say on this, but a fairly obvious first
approximation may be to work on a log scale
and then exponentiate. On a variety of empirical
and quasitheoretical grounds lognormality is
often invoked for incomes; the idea that
many of our incomes typically change by some (small)
percent per year is naturally one of them.

The user-written -mdensity- on SSC has some handles
here.

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```