[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Martin Weiss" <martin.weiss@uni-tuebingen.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: how to deal with categories? |

Date |
Tue, 3 Jun 2008 14:56:05 +0200 |

Regarding 1., it is hard to see why you let dummies eat up your degrees of freedom when you have a continuous variable "income" which -regress- accepts. The fact that there is no upper limit for income does not render it invalid as a covariate. Just include income itself without further ado (pun intended :-) ). Regarding 2., take a look at -h extended_fcn- to extract variable labels and the like. Other listers may have more elaborate advice... Martin Weiss _________________________________________________________________ Diplom-Kaufmann Martin Weiss Mohlstrasse 36 Room 415 72074 Tuebingen Germany Fon: 0049-7071-2978184 Home: http://www.wiwi.uni-tuebingen.de/cms/index.php?id=1130 Publications: http://www.wiwi.uni-tuebingen.de/cms/index.php?id=1131 SSRN: http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=669945 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Andrea Bennett Sent: Tuesday, June 03, 2008 2:40 PM To: statalist@hsphsun2.harvard.edu Subject: st: how to deal with categories? Dear all, Right now I am wondering what is the better way to deal with categorical information. 1. What is about the best way to implement income groups into a regression? E.g. as income has (usually) no upper limits, I tend to generate an interaction term (dummy==1) if the individual is in the highest income category (0 if else). Further, am I right in the assumption that building categories is usually not sensible when the number of observations is high? One issue I face is that very young adults and very old adults are under-represented in the dataset (meaning, not that many unique observations for these groups, sample itself is good). Is there a rule of thumb what would be better, building categories for all age-classes (increasing observations in young/old group) or do not build classes at all (having more detailed info)? It's clearly a trade-off but maybe there's some advice. I tend not to use categories here, also because age-squared might be important to have at hand, later. 2. The "xi" command can help to make life less messy (in large data sets, I think). But it seems to kill all my value labels in these categorical groups! I could not find any option to tell "xi" to use the already defined value labels. Is there a workaround at hand so that the regression table will instead use the values defined (e.g. for sex; 0==male, 1==female) as new variable names? Many thanks for all your inputs, Andrea * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: how to deal with categories?***From:*Andrea Bennett <mac.stata@gmail.com>

**References**:**st: how to deal with categories?***From:*Andrea Bennett <mac.stata@gmail.com>

- Prev by Date:
**Re: st: how to deal with categories?** - Next by Date:
**st: correlation in a bivariate probit model** - Previous by thread:
**Re: st: how to deal with categories?** - Next by thread:
**Re: st: RE: how to deal with categories?** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |