[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: two-variable Frequency table |

Date |
Tue, 16 Dec 2003 10:16:53 -0000 |

absc11@bangor.ac.uk > I want to generate the following frequency table: > for example, > > VARIABLE X > (0.5-1.0) (1.0-1.5) (1.5-2.0) (......) > (0.1-0.2) > V > A (0.2-0.3) > R > (0.3-0.4) > Y > (.......) > > > I have two continuous variables and I want to create a > table that is esentially > a scatterplot of number of observations, instead of points > in a graph. Its also > necessary that I must be able to determine the upper and > lower limits of the > intervals (as they dont have to be necessarily balanced). > > I dont know whether there is already a written command on > this, perhaps I was > not careful enough to find it. I don't think there is a command to do this in one, but no matter. As it happens, I'd argue that this is a problem for which there should not be a single command, as it splits quite naturally into two distinct problems. In essence, you want to set up a subdivision of variables into classes or bins and then get a cross-tabulation. Only the first requires any work. There was some discussion of similar issues in a thread on rounding down (and up) started on 22 June. This answer draws on a write-up of that thread, in press in the Stata Journal 3(4) 2003 as a tip (see the end of http://www.stata-journal.com/sjfaq.html#types for an explanation of Stata tips). Suppose you want to round down, in multiples of some fixed number. For concreteness, say you want to round -mpg- in the auto data in multiples of 5, so that any values 10-14 get rounded to 10, any values 15-19 to 15, etc. -mpg- is simple in that only integer values occur; in many other cases we clearly have fractional parts to think about as well, although the solutions do not differ. Here is an easy solution: 5 * floor(mpg/5). -floor()-, added in Stata 8, always rounds down to the integer less than or equal to its argument. The name "floor" is due to Kenneth E. Iverson (1962), the principal architect of APL, who also suggested an expressive notation I can't emulate here as I'm font-challenged. For further discussion, see Knuth (1997, p.39) or Graham, Knuth and Patashnik (1994, Ch.3). As it happens, 5 * int(mpg/5) gives exactly the same result for -mpg- in the auto data, but in general whenever variables may be negative as well as positive, interval * floor(expression / interval) gives a more consistent classification. Let us compare this briefly with other possible solutions. -round(mpg, 5)- is different, as this rounds to the nearest multiple of 5, which could be either rounding up or rounding down. -round(mpg - 2.5, 5)- should be fine, but is also a little too much like a dodge. With the function -recode()- you need two dodges, say -recode(-mpg,-40,-35,-30,-25,-20,-15,-10)-. Note all the negative signs: negating and then negating to reverse it are necessary because -recode()- uses its numeric arguments as upper limits, i.e. it rounds up. Naturally, if you want rounding up, that is fine. -egen, cut()- offers another solution with option call -at(10(5)45)-. Being able to specify a numlist is nice, as compared with spelling out a comma-separated list, but you must also add a limit, here 45, which will not be used; otherwise with -at(10(5)40)- your highest class will be missing. Yutaka Aoki also suggested to me -mpg - mod(mpg,5)- which follows immediately once you see that rounding down amounts to subtracting the appropriate remainder. -mod(,)-, however, does not offer a correspondingly neat way of rounding up. The -floor- solution grows on one, and it has the merit that you do not need to spell out all the possible end values, with the risk of forgetting or mistyping some. Conversely, -recode()- and -egen, cut()- are not restricted to rounding in equal intervals and remain useful for more complicated problems. Without recapitulating the whole argument insofar as it applies to rounding up, -floor()-'s sibling -ceil()- (short for ceiling) gives a nice way of rounding up in equal intervals, and is easier to work with than expressions based on -int()-. So the example given looks like gen roundedx = 0.5 * floor(x/0.5) gen roundedy = 0.1 * floor(x/0.1) if you want rounding down, or the same with -ceil()- if you want rounding up, or something with the -recode()- function or -egen, cut()- if you want unequal intervals. tab roundedy roundedx then gives the tabulations. You probably want to keep variable labels etc. One way to do that is to use -copydesc- from SSC. Graham, R. L., D. E. Knuth and O. Patashnik. 1994. Concrete mathematics: a foundation for computer science. Reading, MA: Addison-Wesley. Iverson, K. E. 1962. A programming language. New York: John Wiley. Knuth, D. E. 1997. The art of computer programming: Volume 1, Fundamental algorithms. Reading, MA: Addison-Wesley. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: two-variable Frequency table***From:*"D.Christodoulou" <absc11@bangor.ac.uk>

- Prev by Date:
**st: RE: Evaluating an expression in -forvalues-** - Next by Date:
**Re: st: general statistical reasoning question in biomedicalstatistics (no Stata content)** - Previous by thread:
**st: Evaluating an expression in -forvalues-** - Next by thread:
**Re: st: two-variable Frequency table** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |