[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: Your opinion on income groups and inflation

Subject   RE: st: RE: Your opinion on income groups and inflation
Date   Mon, 9 Jun 2008 12:03:45 -0400

I am loath to belabor the point but I think that there are no hard and fast
rules against or in favor of "dummization" of ordered categorical variables. It
depends on what that variable represents.

The income ranges that one normally works with, have no significance per se.
They are often determined by statistical offices or by enumerators for their own
convenience and respond to multiplicity of objectives. Thus, intervals are
almost always of unequal size (one interval may be 100-200, another 1200-5200,
and the final interval may be an open interval, e.g., all incomes > 150,000
etc.)  So to run a dummy for people with income levels between 100-200 and
another dummy for  people with incomes between 1200-5200, does not have any
prior meaning that you want to explore because these intervals do not respond to
some "real"  inherent differences between these types of people (Ho: people with
incomes between 100 to 200 behave all the same way and differently from people
whose income range is 1200-5200 and who in their turn all  behave the same way).
Keeping the same ranges and using dummies would be particularly problematic if
you have panel data--because these same income ranges, depending on time
(inflation, real income growth)--may mean totally different things.

The intervals do not reflect, unlike e.g., the rural vs. urban distinction,
something that we believe is a meaningful distinction between the groups  and
which we want to explore, but are created (as I said) partly by accident and
partly for the sake of convenience ("we want to have 10 income classes", "we
create some prior intervals into which to place people hoping that the resulting
distribution will be normal or lognormal", yet  the outcome may be very
different--so you may end up with  30% percent of people in one interval, and 1%
in another). Using them  as dummies gives them therefore an importance which
they do not possess. They are actually "compressions" of a continuous variable
(income) and it therefore makes more sense to try to "unpack" them by using the
means and treat them as proxies for a continuous variable (which they actually


Development Research, World Bank
Email: or branko_mi@yahoo.
tel: 202-473-6968
World Bank, Room MC 3-559
1818 H Street NW
Washington D.C. 20433

For "Worlds Apart" see


For papers see also:

             "Nick Cox"                                                         
             uk>                                                             To 
             Sent by:                    <>       
             owner-statalist@hsp                                             cc 
                                         RE: st: RE: Your opinion on income     
             06/09/2008 05:56 AM         groups and inflation                   
              Please respond to                                                 

-mrunning- and -mlowess- are possible graphical aids here, giving
smooths of response versus each predictor, with adjustment for other
predictors. Use -findit- to identify locations of program files.


Austin Nichols

I strongly disagree with Martin Weiss, SamL, and Branko milanovic who
claim that an ordered categorical explanatory variable can be included
as a sensible regressor without justification.  Creating dummies *is*
justifiable; you are merely computing conditional means.  Including
income (or "trust") as a single explanatory variable when income (or
"trust") is measured as an ordered categorical explanatory variable
requires a strong assumption that the effect is linear in the index of
categories.  The dummy variable approach requires no such assumption.
As Richard Williams quite rightly points out, you can -test- whether
the effect is linear in the index, or whether groups of individual
dummies all have the same effect.  One useful way is to create dummies
that correspond to more interpretable groups, like above the median,
more than twice the median, less than half the median, etc. so you can
see directly from the regression output where deviations from
linearity occur...  graphs are also helpful for this purpose.

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index