[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Andrea Bennett <mac.stata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: how to deal with categories? |

Date |
Tue, 3 Jun 2008 15:27:28 +0200 |

This sound exactly like what I was looking for! Many thanks for this info! Andrea On Jun 3, 2008, at 3:22 PM, philippe van kerm wrote:

The command -dummieslab- (available on the SSC archive) may be useful to deal with your second question:

TITLE

'DUMMIESLAB': module to convert categorical variable to dummy variables using label names

DESCRIPTION/AUTHOR(S)

dummieslab generates a set of dummy variables from a categorical

variable. One dummy variable is created for each level of the

original variable. Names for the dummy variables are derived from

the value labels of the categorical variable.

To install from within Stata:

ssc install dummieslab

Philippe

-----Original Message-----

From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu ] On Behalf Of Andrea Bennett

Sent: Tuesday, June 03, 2008 2:40 PM

To: statalist@hsphsun2.harvard.edu

Subject: st: how to deal with categories?

Dear all,

Right now I am wondering what is the better way to deal with

categorical information.

1.

What is about the best way to implement income groups into a

regression? E.g. as income has (usually) no upper limits, I tend to

generate an interaction term (dummy==1) if the individual is in the

highest income category (0 if else). Further, am I right in the

assumption that building categories is usually not sensible when the

number of observations is high? One issue I face is that very young

adults and very old adults are under-represented in the dataset

(meaning, not that many unique observations for these groups, sample

itself is good). Is there a rule of thumb what would be better,

building categories for all age-classes (increasing observations in

young/old group) or do not build classes at all (having more detailed

info)? It's clearly a trade-off but maybe there's some advice. I tend

not to use categories here, also because age-squared might be

important to have at hand, later.

2.

The "xi" command can help to make life less messy (in large data sets,

I think). But it seems to kill all my value labels in these

categorical groups! I could not find any option to tell "xi" to use

the already defined value labels. Is there a workaround at hand so

that the regression table will instead use the values defined (e.g.

for sex; 0==male, 1==female) as new variable names?

Many thanks for all your inputs,

Andrea

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: how to deal with categories?***From:*Andrea Bennett <mac.stata@gmail.com>

**st: RE: how to deal with categories?***From:*philippe van kerm <philippe.vankerm@ceps.lu>

- Prev by Date:
**Re: st: RE: how to deal with categories?** - Next by Date:
**st: RE: how to deal with categories?** - Previous by thread:
**st: RE: how to deal with categories?** - Next by thread:
**st: xtmixed documentation** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |