Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: egen cut - how to force a category even if zero observations |
Date | Thu, 27 Feb 2014 01:10:21 +0000 |
If allowed to express an opinion -egen, cut()- would express total willingness to assign the value 0 should any values satisfy your rule, but in your dataset they don't. There is no sense in which -egen, cut()- can create a category that persists in any sense after the calculation is done. In particular, there is no sense in which -tabulate- remembers or knows that a particular rule was used to create the variable; it just shows the values as they exist when invoked. You could say much the same about many other categorisations. So, in your dataset, the rule 10 * floor(age/10) would create the same numeric values 40(10)90; in principle it _could_ have created ..., 10, 20, 30 or 100, 110, 120, ... but no such values were created because no suitable data values were found. I suspect that you mean this: 1. I am thinking of my variable as categorised into a fixed, finite set of categories. 2. So I wish to see zero occurrences tabulated if any of those categories do not exist in the data. The crux here is not how variables are created; it is what tabulation commands will or will not do. So your complaint is really about -tabulate- used for one-way tables, which just declines to show non-existent categories. Stata has to be fought all the way to satisfy this desire: -tabcount- (SSC) is one approach and there was some discussion in SJ-3-4 pr0011 . . . . . . . . Speaking Stata: Problems with tables, Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q4/03 SJ 3(4):420--439 (no commands) reviews three user-written commands (tabcount, makematrix, and groups) as different approaches to tabulation problems http://www.stata-journal.com/sjpdf.html?articlenum=pr0011 -fre- (SSC) is another such approach. Nick njcoxstata@gmail.com On 27 February 2014 00:38, Anthony Khawaja <anthonykhawaja@gmail.com> : > Is it possible to force the "egen cut" command to keep a specified category > even though there are zero observations within that category. For example, > I want to write a script that will work on multiple different datasets, and > I want to categorise age into <40, >=40 <50, >=50 <60, >=60 <70, >=70 <80, >>=80 <90, >=90. The egen cut command works well unless I have zero > observation in a category - rather than still creating that as a level of > the new categorical variable, Stata just doesn't form the category. This > would usually be fine, but I am using "file write" commands from which I > want to produce identically shaped tab delimited files to easily overlay > numbers from multiple studies. > > I have searched extensively but not found a simple solution. Of course I > could manually create the new variable, level by level. But I have many > such variables, and this would be time consuming (and lack elegance!). Does > anyone know of an elegant solution? > > For example, in one dataset, there are no participants <40 years. So the > following command yields one less level in the categorical variable produced > than I wanted: > > . egen agecut = cut(age), at (0 40 50 60 70 80 90 130) label > > > . tab agecut > > agecut | Freq. Percent Cum. > ------------+----------------------------------- > 40- | 29 0.39 0.39 > 50- | 873 11.73 12.12 > 60- | 3,540 47.56 59.67 > 70- | 2,364 31.76 91.43 > 80- | 629 8.45 99.88 > 90- | 9 0.12 100.00 > ------------+----------------------------------- > Total | 7,444 100.00 > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/