[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
philippe van kerm <philippe.vankerm@ceps.lu> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: how to deal with categories? |

Date |
Tue, 3 Jun 2008 15:22:48 +0200 |

The command -dummieslab- (available on the SSC archive) may be useful to deal with your second question: TITLE 'DUMMIESLAB': module to convert categorical variable to dummy variables using label names DESCRIPTION/AUTHOR(S) dummieslab generates a set of dummy variables from a categorical variable. One dummy variable is created for each level of the original variable. Names for the dummy variables are derived from the value labels of the categorical variable. To install from within Stata: ssc install dummieslab Philippe -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Andrea Bennett Sent: Tuesday, June 03, 2008 2:40 PM To: statalist@hsphsun2.harvard.edu Subject: st: how to deal with categories? Dear all, Right now I am wondering what is the better way to deal with categorical information. 1. What is about the best way to implement income groups into a regression? E.g. as income has (usually) no upper limits, I tend to generate an interaction term (dummy==1) if the individual is in the highest income category (0 if else). Further, am I right in the assumption that building categories is usually not sensible when the number of observations is high? One issue I face is that very young adults and very old adults are under-represented in the dataset (meaning, not that many unique observations for these groups, sample itself is good). Is there a rule of thumb what would be better, building categories for all age-classes (increasing observations in young/old group) or do not build classes at all (having more detailed info)? It's clearly a trade-off but maybe there's some advice. I tend not to use categories here, also because age-squared might be important to have at hand, later. 2. The "xi" command can help to make life less messy (in large data sets, I think). But it seems to kill all my value labels in these categorical groups! I could not find any option to tell "xi" to use the already defined value labels. Is there a workaround at hand so that the regression table will instead use the values defined (e.g. for sex; 0==male, 1==female) as new variable names? Many thanks for all your inputs, Andrea * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: how to deal with categories?***From:*Andrea Bennett <mac.stata@gmail.com>

**References**:**st: how to deal with categories?***From:*Andrea Bennett <mac.stata@gmail.com>

- Prev by Date:
**Re: st: RE: how to deal with categories?** - Next by Date:
**Re: st: data problem - duplicates** - Previous by thread:
**Re: st: RE: how to deal with categories?** - Next by thread:
**Re: st: RE: how to deal with categories?** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |