[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: two-variable Frequency table

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	Re: st: two-variable Frequency table
Date	Tue, 16 Dec 2003 10:16:53 -0000

[email protected]

> I want to generate the following frequency table:
> for example,
>
>                       VARIABLE X
>              (0.5-1.0)  (1.0-1.5)  (1.5-2.0)  (......)
>   (0.1-0.2)
> V
> A (0.2-0.3)
> R
>   (0.3-0.4)
> Y
>   (.......)
>
>
> I have two continuous variables and I want to create a
> table that is esentially
> a scatterplot of number of observations, instead of points
> in a graph. Its also
> necessary that I must be able to determine the upper and
> lower limits of the
> intervals (as they dont have to be necessarily balanced).
>
> I dont know whether there is already a written command on
> this, perhaps I was
> not careful enough to find it.

I don't think there is a command to do this in one, but
no matter. As it happens, I'd argue that this is a problem
for which there should not be a single command, as it splits
quite naturally into two distinct problems.

In essence, you want to set up a subdivision of variables
into classes or bins and then get a cross-tabulation.
Only the first requires any work.

There was some discussion of similar issues
in a thread on rounding down (and up) started
on 22 June. This answer draws on a write-up
of that thread, in press in the Stata Journal 3(4) 2003
as a tip (see the end of
http://www.stata-journal.com/sjfaq.html#types
for an explanation of Stata tips).

Suppose you want to round down, in multiples of some fixed number.
For concreteness, say you want to round -mpg- in the auto data
in multiples of 5, so that any values 10-14 get rounded to 10, any
values 15-19 to 15, etc. -mpg- is simple in that only integer
values occur; in many other cases we clearly have fractional parts
to think about as well, although the solutions do not differ.

Here is an easy solution: 5 * floor(mpg/5).  -floor()-, added in
Stata 8, always rounds down to the integer less than or equal to its
argument. The name "floor" is due to Kenneth E. Iverson
(1962), the principal architect of APL, who also suggested an
expressive notation I can't emulate here as I'm font-challenged.
For further discussion, see Knuth (1997, p.39) or Graham, Knuth and
Patashnik (1994, Ch.3).

As it happens, 5 * int(mpg/5) gives exactly the same result
for -mpg- in the auto data, but in general whenever variables
may be negative as well as positive,

interval * floor(expression / interval)

gives a more consistent classification.

Let us compare this briefly with other possible solutions.
-round(mpg, 5)- is different, as this rounds to the nearest
multiple of 5, which could be either rounding up or rounding down.
-round(mpg - 2.5, 5)- should be fine, but is also a little too
much like a dodge.

With the function -recode()- you need two dodges, say
-recode(-mpg,-40,-35,-30,-25,-20,-15,-10)-.  Note all the negative
signs: negating and then negating to reverse it are necessary
because -recode()- uses its numeric arguments as upper limits,
i.e. it rounds up.  Naturally, if you want rounding up, that
is fine.

-egen, cut()- offers another solution with option call -at(10(5)45)-.
Being able to specify a numlist is nice, as
compared with spelling out a comma-separated list, but you
must also add a limit, here 45, which will not be used; otherwise
with -at(10(5)40)- your highest class will be missing.

Yutaka Aoki also suggested to me -mpg - mod(mpg,5)-
which follows immediately once you see that rounding down
amounts to subtracting the appropriate remainder. -mod(,)-,
however, does not offer a correspondingly neat way of rounding up.

The -floor- solution grows on one, and it has the merit that
you do not need to spell out all the possible end values, with the
risk of forgetting or mistyping some. Conversely, -recode()-
and -egen, cut()- are not restricted to rounding in equal
intervals and remain useful for more complicated problems.

Without recapitulating the whole argument insofar as it applies to
rounding up, -floor()-'s sibling -ceil()- (short for
ceiling) gives a nice way of rounding up in equal intervals, and
is easier to work with than expressions based on -int()-.

So the example given looks like

gen roundedx = 0.5 * floor(x/0.5)
gen roundedy = 0.1 * floor(x/0.1)

if you want rounding down, or the same with -ceil()-
if you want rounding up, or something with the
-recode()- function or -egen, cut()- if you want
unequal intervals.

tab roundedy roundedx

then gives the tabulations. You probably want to
keep variable labels etc. One way to do that
is to use -copydesc- from SSC.

Graham, R. L., D. E. Knuth and O. Patashnik. 1994.
Concrete mathematics: a foundation for computer science.
Reading, MA: Addison-Wesley.

Iverson, K. E. 1962. A programming language.
New York: John Wiley.

Knuth, D. E. 1997. The art of computer programming: Volume
1, Fundamental algorithms. Reading, MA: Addison-Wesley.

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: two-variable Frequency table
  - From: "D.Christodoulou" <[email protected]>

Prev by Date: st: RE: Evaluating an expression in -forvalues-
Next by Date: Re: st: general statistical reasoning question in biomedicalstatistics (no Stata content)
Previous by thread: st: Evaluating an expression in -forvalues-
Next by thread: Re: st: two-variable Frequency table
Index(es):
- Date
- Thread