[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
bmilanovic@worldbank.org |

To |
statalist@hsphsun2.harvard.edu |

Subject |
RE: st: RE: Your opinion on income groups and inflation |

Date |
Mon, 9 Jun 2008 12:03:45 -0400 |

I am loath to belabor the point but I think that there are no hard and fast rules against or in favor of "dummization" of ordered categorical variables. It depends on what that variable represents. The income ranges that one normally works with, have no significance per se. They are often determined by statistical offices or by enumerators for their own convenience and respond to multiplicity of objectives. Thus, intervals are almost always of unequal size (one interval may be 100-200, another 1200-5200, and the final interval may be an open interval, e.g., all incomes > 150,000 etc.) So to run a dummy for people with income levels between 100-200 and another dummy for people with incomes between 1200-5200, does not have any prior meaning that you want to explore because these intervals do not respond to some "real" inherent differences between these types of people (Ho: people with incomes between 100 to 200 behave all the same way and differently from people whose income range is 1200-5200 and who in their turn all behave the same way). Keeping the same ranges and using dummies would be particularly problematic if you have panel data--because these same income ranges, depending on time (inflation, real income growth)--may mean totally different things. The intervals do not reflect, unlike e.g., the rural vs. urban distinction, something that we believe is a meaningful distinction between the groups and which we want to explore, but are created (as I said) partly by accident and partly for the sake of convenience ("we want to have 10 income classes", "we create some prior intervals into which to place people hoping that the resulting distribution will be normal or lognormal", yet the outcome may be very different--so you may end up with 30% percent of people in one interval, and 1% in another). Using them as dummies gives them therefore an importance which they do not possess. They are actually "compressions" of a continuous variable (income) and it therefore makes more sense to try to "unpack" them by using the means and treat them as proxies for a continuous variable (which they actually are). Branko Development Research, World Bank Email: bmilanovic@worldbank.org or branko_mi@yahoo. tel: 202-473-6968 World Bank, Room MC 3-559 1818 H Street NW Washington D.C. 20433 For "Worlds Apart" see http://www.pupress.princeton.edu/titles/7946.html Website: http://econ.worldbank.org/projects/inequality For papers see also: http://econpapers.hhs.se/ http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=149002 "Nick Cox" <n.j.cox@durham.ac. uk> To Sent by: <statalist@hsphsun2.harvard.edu> owner-statalist@hsp cc hsun2.harvard.edu Subject RE: st: RE: Your opinion on income 06/09/2008 05:56 AM groups and inflation Please respond to statalist@hsphsun2. harvard.edu -mrunning- and -mlowess- are possible graphical aids here, giving smooths of response versus each predictor, with adjustment for other predictors. Use -findit- to identify locations of program files. Nick n.j.cox@durham.ac.uk Austin Nichols Andrea-- I strongly disagree with Martin Weiss, SamL, and Branko milanovic who claim that an ordered categorical explanatory variable can be included as a sensible regressor without justification. Creating dummies *is* justifiable; you are merely computing conditional means. Including income (or "trust") as a single explanatory variable when income (or "trust") is measured as an ordered categorical explanatory variable requires a strong assumption that the effect is linear in the index of categories. The dummy variable approach requires no such assumption. As Richard Williams quite rightly points out, you can -test- whether the effect is linear in the index, or whether groups of individual dummies all have the same effect. One useful way is to create dummies that correspond to more interpretable groups, like above the median, more than twice the median, less than half the median, etc. so you can see directly from the regression output where deviations from linearity occur... graphs are also helpful for this purpose. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**RE: st: RE: Your opinion on income groups and inflation***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: RE: extract portion of variable name to use in variable label** - Next by Date:
**Re: st: RE: extract portion of variable name to use in variable label** - Previous by thread:
**RE: st: RE: Your opinion on income groups and inflation** - Next by thread:
**st: using egen for categorical variable indicating groups** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |