Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: how to deal with categories?


From   philippe van kerm <philippe.vankerm@ceps.lu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: how to deal with categories?
Date   Tue, 3 Jun 2008 15:22:48 +0200

The command -dummieslab- (available on the SSC archive) may be useful to deal with your second question:

TITLE
      'DUMMIESLAB': module to convert categorical variable to dummy variables using label names

DESCRIPTION/AUTHOR(S)

      dummieslab generates a set of dummy variables from a categorical
      variable. One dummy variable is created for each level of the
      original variable. Names for the dummy variables are derived from
      the value labels of the categorical variable.

To install from within Stata:

  ssc install dummieslab


Philippe

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Andrea Bennett
Sent: Tuesday, June 03, 2008 2:40 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: how to deal with categories?

Dear all,

Right now I am wondering what is the better way to deal with
categorical information.

1.
What is about the best way to implement income groups into a
regression? E.g. as income has (usually) no upper limits, I tend to
generate an interaction term (dummy==1) if the individual is in the
highest income category (0 if else). Further, am I right in the
assumption that building categories is usually not sensible when the
number of observations is high? One issue I face is that very young
adults and very old adults are under-represented in the dataset
(meaning, not that many unique observations for these groups, sample
itself is good). Is there a rule of thumb what would be better,
building categories for all age-classes (increasing observations in
young/old group) or do not build classes at all (having more detailed
info)? It's clearly a trade-off but maybe there's some advice. I tend
not to use categories here, also because age-squared might be
important to have at hand, later.

2.
The "xi" command can help to make life less messy (in large data sets,
I think). But it seems to kill all my value labels in these
categorical groups! I could not find any option to tell "xi" to use
the already defined value labels. Is there a workaround at hand so
that the regression table will instead use the values defined (e.g.
for sex; 0==male, 1==female) as new variable names?

Many thanks for all your inputs,

Andrea
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index