[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"D.Christodoulou" <absc11@bangor.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: two-variable Frequency table |

Date |
Tue, 16 Dec 2003 14:18:06 +0000 |

I appreciate your thorough answer Nick, it just couldnt get better. I get right on to it, thanks again, Dimitris Nick Cox wrote: > > absc11@bangor.ac.uk > > > I want to generate the following frequency table: > > for example, > > > > VARIABLE X > > (0.5-1.0) (1.0-1.5) (1.5-2.0) (......) > > (0.1-0.2) > > V > > A (0.2-0.3) > > R > > (0.3-0.4) > > Y > > (.......) > > > > > > I have two continuous variables and I want to create a > > table that is esentially > > a scatterplot of number of observations, instead of points > > in a graph. Its also > > necessary that I must be able to determine the upper and > > lower limits of the > > intervals (as they dont have to be necessarily balanced). > > > > I dont know whether there is already a written command on > > this, perhaps I was > > not careful enough to find it. > > I don't think there is a command to do this in one, but > no matter. As it happens, I'd argue that this is a problem > for which there should not be a single command, as it splits > quite naturally into two distinct problems. > > In essence, you want to set up a subdivision of variables > into classes or bins and then get a cross-tabulation. > Only the first requires any work. > > There was some discussion of similar issues > in a thread on rounding down (and up) started > on 22 June. This answer draws on a write-up > of that thread, in press in the Stata Journal 3(4) 2003 > as a tip (see the end of > http://www.stata-journal.com/sjfaq.html#types > for an explanation of Stata tips). > > Suppose you want to round down, in multiples of some fixed number. > For concreteness, say you want to round -mpg- in the auto data > in multiples of 5, so that any values 10-14 get rounded to 10, any > values 15-19 to 15, etc. -mpg- is simple in that only integer > values occur; in many other cases we clearly have fractional parts > to think about as well, although the solutions do not differ. > > Here is an easy solution: 5 * floor(mpg/5). -floor()-, added in > Stata 8, always rounds down to the integer less than or equal to its > argument. The name "floor" is due to Kenneth E. Iverson > (1962), the principal architect of APL, who also suggested an > expressive notation I can't emulate here as I'm font-challenged. > For further discussion, see Knuth (1997, p.39) or Graham, Knuth and > Patashnik (1994, Ch.3). > > As it happens, 5 * int(mpg/5) gives exactly the same result > for -mpg- in the auto data, but in general whenever variables > may be negative as well as positive, > > interval * floor(expression / interval) > > gives a more consistent classification. > > Let us compare this briefly with other possible solutions. > -round(mpg, 5)- is different, as this rounds to the nearest > multiple of 5, which could be either rounding up or rounding down. > -round(mpg - 2.5, 5)- should be fine, but is also a little too > much like a dodge. > > With the function -recode()- you need two dodges, say > -recode(-mpg,-40,-35,-30,-25,-20,-15,-10)-. Note all the negative > signs: negating and then negating to reverse it are necessary > because -recode()- uses its numeric arguments as upper limits, > i.e. it rounds up. Naturally, if you want rounding up, that > is fine. > > -egen, cut()- offers another solution with option call -at(10(5)45)-. > Being able to specify a numlist is nice, as > compared with spelling out a comma-separated list, but you > must also add a limit, here 45, which will not be used; otherwise > with -at(10(5)40)- your highest class will be missing. > > Yutaka Aoki also suggested to me -mpg - mod(mpg,5)- > which follows immediately once you see that rounding down > amounts to subtracting the appropriate remainder. -mod(,)-, > however, does not offer a correspondingly neat way of rounding up. > > The -floor- solution grows on one, and it has the merit that > you do not need to spell out all the possible end values, with the > risk of forgetting or mistyping some. Conversely, -recode()- > and -egen, cut()- are not restricted to rounding in equal > intervals and remain useful for more complicated problems. > > Without recapitulating the whole argument insofar as it applies to > rounding up, -floor()-'s sibling -ceil()- (short for > ceiling) gives a nice way of rounding up in equal intervals, and > is easier to work with than expressions based on -int()-. > > So the example given looks like > > gen roundedx = 0.5 * floor(x/0.5) > gen roundedy = 0.1 * floor(x/0.1) > > if you want rounding down, or the same with -ceil()- > if you want rounding up, or something with the > -recode()- function or -egen, cut()- if you want > unequal intervals. > > tab roundedy roundedx > > then gives the tabulations. You probably want to > keep variable labels etc. One way to do that > is to use -copydesc- from SSC. > > Graham, R. L., D. E. Knuth and O. Patashnik. 1994. > Concrete mathematics: a foundation for computer science. > Reading, MA: Addison-Wesley. > > Iverson, K. E. 1962. A programming language. > New York: John Wiley. > > Knuth, D. E. 1997. The art of computer programming: Volume > 1, Fundamental algorithms. Reading, MA: Addison-Wesley. > > Nick > n.j.cox@durham.ac.uk > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ -- --------------------------------------------- Dimitris Christodoulou Associate Researcher School for Business and Regional Development University of Wales, Bangor Hen Coleg LL57 2DG Bangor UK e-mail: absc11@bangor.ac.uk --------------------------------------------- * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: two-variable Frequency table***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: mfx compute, predict(pu0) after xtprobit** - Next by Date:
**Re: st: mfx compute, predict(pu0) after xtprobit** - Previous by thread:
**Re: st: two-variable Frequency table** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |