Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Turning banded income into a continuous variable

From   "David Moore" <>
Subject   st: RE: Turning banded income into a continuous variable
Date   Wed, 2 Apr 2003 11:56:14 -0800

Let me start with a non-answer comment.  I don't think assigning the
midpoint is necessarily simplistic.  Easy, yes, but why simplistic?  With
narrow bands, these can be very good guesses.  Given the notoriously poor
performance of statistical models to predict individual (self-reported)
income, I'm not so confident that a model based approach is necessarily
superior.  Even so, it seems intuitively appealing to model income for
additional information with which to improve on midpoint estimates.

For what it's worth, here's a completely data (as opposed to theory) driven
approach.  Begin, as suggested, with -intreg- and create predicted values.
As you noted, these will generally not fall within the bounds of the
original ranges.  Assuming that the originally coded data is correct, if
somewhat imprecise, the question becomes one of reconstructing the
distribution of incomes within each range.  Taking the predicted values as
"scores" rather than the truth, however, it is possible to assign these as
incomes bounded by the appropriate range.  It's a simple (simplistic?)
matter of transforming the predicted values so that they fall within the
associated range.  In other words, for respondents with incomes in the
$20,000 - $29,999 range, re-scale their predicted values so that the minimum
and maximum have the same $10,000 dollar range and shift the mean to
coincide with these bounds.  You could even constrain the median income of
these respondents to equal the midpoint of the interval, though this is
probably not a good idea given what we know about the distribution of

> -----Original Message-----
> From:
> []On Behalf Of Morris,
> Stephen
> Sent: Wednesday, April 02, 2003 4:00 AM
> To: ''
> Subject: st: Turning banded income into a continuous variable
> Hello,
> I have individual-level data on grouped income, where an
> individual's income
> may be in one of 31 bands. The bands get wider at higher incomes (the
> highest being open-ended). I know the cut points for each band and the
> number of observations/individuals within each band. I would like
> to compute
> a continuous individual-level income variable from this. The most common
> method seems to be to take the midpoint of the band and assign
> this value to
> all individuals in the band, making some reasonably arbitrary
> assumption as
> to the value to be used for the highest (unbounded) group. It seems to me
> this is very simplistic. I use Stata 7 and in the past I have
> used interval
> regression to regress banded income on a number of explanatory
> variables and
> then used the predictions from this model as a measure of
> predicted income.
> Invariably this methods leads to some observations having a
> predicted income
> value outside the original band, which is not very satisfactory.
> I have been looking at the possibility of using some kind of
> kernel density
> estimation, setting the widths equal to the income bands.
> However, as far as
> I can see, in Stata 7's hardwired -kdensity- command it is not possible to
> set different widths across the distribution (my income bands are
> not equal
> widths). Also, I am not sure how then to apply the information on the
> resulting density estimates to generate values for observations
> within each
> band.
> Any thoughts on how to proceed are much appreciated. I guess this
> must be a
> common problem and applies to a whole range of variables, not just income.
> Apologies in advance if I have missed something obvious.
> Thankyou.
> Steve Morris
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index