Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: how to deal with categories?


From   "Martin Weiss" <martin.weiss@uni-tuebingen.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: how to deal with categories?
Date   Tue, 3 Jun 2008 14:56:05 +0200

Regarding 1., it is hard to see why you let dummies eat up your degrees of
freedom when you have a continuous variable "income" which -regress-
accepts. The fact that there is no upper limit for income does not render it
invalid as a covariate. Just include income itself without further ado (pun
intended :-) ).

Regarding 2., take a look at -h extended_fcn- to extract variable labels and
the like. Other listers may have more elaborate advice...


Martin Weiss
_________________________________________________________________

Diplom-Kaufmann Martin Weiss
Mohlstrasse 36
Room 415
72074 Tuebingen
Germany

Fon: 0049-7071-2978184

Home: http://www.wiwi.uni-tuebingen.de/cms/index.php?id=1130

Publications: http://www.wiwi.uni-tuebingen.de/cms/index.php?id=1131

SSRN: http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=669945


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Andrea Bennett
Sent: Tuesday, June 03, 2008 2:40 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: how to deal with categories?

Dear all,

Right now I am wondering what is the better way to deal with  
categorical information.

1.
What is about the best way to implement income groups into a  
regression? E.g. as income has (usually) no upper limits, I tend to  
generate an interaction term (dummy==1) if the individual is in the  
highest income category (0 if else). Further, am I right in the  
assumption that building categories is usually not sensible when the  
number of observations is high? One issue I face is that very young  
adults and very old adults are under-represented in the dataset  
(meaning, not that many unique observations for these groups, sample  
itself is good). Is there a rule of thumb what would be better,  
building categories for all age-classes (increasing observations in  
young/old group) or do not build classes at all (having more detailed  
info)? It's clearly a trade-off but maybe there's some advice. I tend  
not to use categories here, also because age-squared might be  
important to have at hand, later.

2.
The "xi" command can help to make life less messy (in large data sets,  
I think). But it seems to kill all my value labels in these  
categorical groups! I could not find any option to tell "xi" to use  
the already defined value labels. Is there a workaround at hand so  
that the regression table will instead use the values defined (e.g.  
for sex; 0==male, 1==female) as new variable names?

Many thanks for all your inputs,

Andrea 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index