[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: how to deal with categories?

From   Andrea Bennett <>
Subject   st: how to deal with categories?
Date   Tue, 3 Jun 2008 14:39:48 +0200

Dear all,

Right now I am wondering what is the better way to deal with categorical information.

What is about the best way to implement income groups into a regression? E.g. as income has (usually) no upper limits, I tend to generate an interaction term (dummy==1) if the individual is in the highest income category (0 if else). Further, am I right in the assumption that building categories is usually not sensible when the number of observations is high? One issue I face is that very young adults and very old adults are under-represented in the dataset (meaning, not that many unique observations for these groups, sample itself is good). Is there a rule of thumb what would be better, building categories for all age-classes (increasing observations in young/old group) or do not build classes at all (having more detailed info)? It's clearly a trade-off but maybe there's some advice. I tend not to use categories here, also because age-squared might be important to have at hand, later.

The "xi" command can help to make life less messy (in large data sets, I think). But it seems to kill all my value labels in these categorical groups! I could not find any option to tell "xi" to use the already defined value labels. Is there a workaround at hand so that the regression table will instead use the values defined (e.g. for sex; 0==male, 1==female) as new variable names?

Many thanks for all your inputs,

Andrea *
* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index