# Re: st: Partitioning a Categorical Variable Based on Frequencies

 From n j cox To statalist@hsphsun2.harvard.edu Subject Re: st: Partitioning a Categorical Variable Based on Frequencies Date Mon, 22 Aug 2005 12:36:39 +0100

Angela James

> I'm trying to partition a categorical variable into classes based on
> the observed frequency for each category. That is, I have about 800
> companies that I'd like to group into 1) "large," 2) "medium", and 3)
> "small" companies based on the observed number (frequency) of
> employees for each company. Can anyone help me locate the appropriate > command to do this?

It sounds to me as if you are trying to create a categorical variable
from a counted one..

Suppose your cut-offs are >= 1000 employees for large, >= 100 for medium.

gen cat_size = cond(size < 100, 1, cond(size < 1000, 2, cond(size < .,
3, .)))

which goes all on one line.

Or,

gen cat_size = 1 if size < 100
replace cat_size = 2 if size < 1000
replace cat_size = 3 if size < .

label def cat_size 1 "small" 2 "medium" 3 "large"
label val size size

> Also, I need to rank the largest 40 or so companies by any number of
> criteria -- % female, % with employees over 40 years of age, etc.
> I've tried using the rank function with egen, but it simply ranks the
> companies according to the value for each (which is derived from their
> alphabetical, sequential ordering after I encoded the variable).
> Again, what is the easiest way to incorporate the observed frequency
> of different types of employees for each company into these analyses?

I have no real idea of what your problem is here. These criteria are
all numeric and -egen, rank()- should work fine. I don't know what
you are encoding here, but whatever you are holding in a string variable
sounds irrelevant to ranking.

You'll need to say more stating exactly what you actually typed
(statndard advice in Statalist FAQ).

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/