[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: how to deal with categories?

From   "Martin Weiss" <>
To   <>
Subject   st: RE: how to deal with categories?
Date   Tue, 3 Jun 2008 14:56:05 +0200

Regarding 1., it is hard to see why you let dummies eat up your degrees of
freedom when you have a continuous variable "income" which -regress-
accepts. The fact that there is no upper limit for income does not render it
invalid as a covariate. Just include income itself without further ado (pun
intended :-) ).

Regarding 2., take a look at -h extended_fcn- to extract variable labels and
the like. Other listers may have more elaborate advice...

Martin Weiss

Diplom-Kaufmann Martin Weiss
Mohlstrasse 36
Room 415
72074 Tuebingen

Fon: 0049-7071-2978184




-----Original Message-----
[] On Behalf Of Andrea Bennett
Sent: Tuesday, June 03, 2008 2:40 PM
Subject: st: how to deal with categories?

Dear all,

Right now I am wondering what is the better way to deal with  
categorical information.

What is about the best way to implement income groups into a  
regression? E.g. as income has (usually) no upper limits, I tend to  
generate an interaction term (dummy==1) if the individual is in the  
highest income category (0 if else). Further, am I right in the  
assumption that building categories is usually not sensible when the  
number of observations is high? One issue I face is that very young  
adults and very old adults are under-represented in the dataset  
(meaning, not that many unique observations for these groups, sample  
itself is good). Is there a rule of thumb what would be better,  
building categories for all age-classes (increasing observations in  
young/old group) or do not build classes at all (having more detailed  
info)? It's clearly a trade-off but maybe there's some advice. I tend  
not to use categories here, also because age-squared might be  
important to have at hand, later.

The "xi" command can help to make life less messy (in large data sets,  
I think). But it seems to kill all my value labels in these  
categorical groups! I could not find any option to tell "xi" to use  
the already defined value labels. Is there a workaround at hand so  
that the regression table will instead use the values defined (e.g.  
for sex; 0==male, 1==female) as new variable names?

Many thanks for all your inputs,

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index