# Re: st: egen cut - how to force a category even if zero observations

 From Nick Cox To "statalist@hsphsun2.harvard.edu" Subject Re: st: egen cut - how to force a category even if zero observations Date Thu, 27 Feb 2014 01:10:21 +0000

```If allowed to express an opinion -egen, cut()- would express total
willingness to assign the value 0 should any values satisfy your rule,
but in your dataset they don't. There is no sense in which -egen,
cut()- can create a category that persists in any sense after the
calculation is done.

In particular, there is no sense in which -tabulate- remembers or
knows that a particular rule was used to create the variable; it just
shows the values as they exist when invoked.

You could say much the same about many other categorisations. So, in

10 * floor(age/10)

would create the same numeric values 40(10)90; in principle it _could_
have created
..., 10, 20, 30 or 100, 110, 120, ... but no such values were created
because no suitable data values were found.

I suspect that you mean this:

1. I am thinking of my variable as categorised into a fixed, finite
set of categories.

2. So I wish to see zero occurrences tabulated if any of those
categories do not exist in the data.

The crux here is not how variables are created; it is what tabulation
commands will or will not do.

which just declines to show non-existent categories. Stata has to be
fought all the way to satisfy this desire: -tabcount- (SSC) is one
approach and there was some discussion in

SJ-3-4  pr0011  . . . . . . . .  Speaking Stata: Problems with tables, Part II
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
Q4/03   SJ 3(4):420--439                                 (no commands)
reviews three user-written commands (tabcount, makematrix,
and groups) as different approaches to tabulation problems

http://www.stata-journal.com/sjpdf.html?articlenum=pr0011

-fre- (SSC) is another such approach.

Nick
njcoxstata@gmail.com

On 27 February 2014 00:38, Anthony Khawaja <anthonykhawaja@gmail.com>
:
> Is it possible to force the "egen cut" command to keep a specified category
> even though there are zero observations within that category.  For example,
> I want to write a script that will work on multiple different datasets, and
> I want to categorise age into <40, >=40 <50, >=50 <60, >=60 <70, >=70 <80,
>>=80 <90, >=90.  The egen cut command works well unless I have zero
> observation in a category - rather than still creating that as a level of
> the new categorical variable, Stata just doesn't form the category.  This
> would usually be fine, but I am using "file write" commands from which I
> want to produce identically shaped tab delimited files to easily overlay
> numbers from multiple studies.
>
> I have searched extensively but not found a simple solution.  Of course I
> could manually create the new variable, level by level.  But I have many
> such variables, and this would be time consuming (and lack elegance!).  Does
> anyone know of an elegant solution?
>
> For example, in one dataset, there are no participants <40 years.  So the
> following command yields one less level in the categorical variable produced
> than I wanted:
>
> . egen agecut = cut(age), at (0 40 50 60 70 80 90 130) label
>
>
> . tab agecut
>
>      agecut |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>         40- |         29        0.39        0.39
>         50- |        873       11.73       12.12
>         60- |      3,540       47.56       59.67
>         70- |      2,364       31.76       91.43
>         80- |        629        8.45       99.88
>         90- |          9        0.12      100.00
> ------------+-----------------------------------
>       Total |      7,444      100.00
>
