[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: two-variable Frequency table

From	"D.Christodoulou" <[email protected]>
To	[email protected]
Subject	Re: st: two-variable Frequency table
Date	Tue, 16 Dec 2003 14:18:06 +0000
I appreciate your thorough answer Nick, it just couldnt get better.
I get right on to it, thanks again,
Dimitris


Nick Cox wrote:
> 
> [email protected]
> 
> > I want to generate the following frequency table:
> > for example,
> >
> >                       VARIABLE X
> >              (0.5-1.0)  (1.0-1.5)  (1.5-2.0)  (......)
> >   (0.1-0.2)
> > V
> > A (0.2-0.3)
> > R
> >   (0.3-0.4)
> > Y
> >   (.......)
> >
> >
> > I have two continuous variables and I want to create a
> > table that is esentially
> > a scatterplot of number of observations, instead of points
> > in a graph. Its also
> > necessary that I must be able to determine the upper and
> > lower limits of the
> > intervals (as they dont have to be necessarily balanced).
> >
> > I dont know whether there is already a written command on
> > this, perhaps I was
> > not careful enough to find it.
> 
> I don't think there is a command to do this in one, but
> no matter. As it happens, I'd argue that this is a problem
> for which there should not be a single command, as it splits
> quite naturally into two distinct problems.
> 
> In essence, you want to set up a subdivision of variables
> into classes or bins and then get a cross-tabulation.
> Only the first requires any work.
> 
> There was some discussion of similar issues
> in a thread on rounding down (and up) started
> on 22 June. This answer draws on a write-up
> of that thread, in press in the Stata Journal 3(4) 2003
> as a tip (see the end of
> http://www.stata-journal.com/sjfaq.html#types
> for an explanation of Stata tips).
> 
> Suppose you want to round down, in multiples of some fixed number.
> For concreteness, say you want to round -mpg- in the auto data
> in multiples of 5, so that any values 10-14 get rounded to 10, any
> values 15-19 to 15, etc. -mpg- is simple in that only integer
> values occur; in many other cases we clearly have fractional parts
> to think about as well, although the solutions do not differ.
> 
> Here is an easy solution: 5 * floor(mpg/5).  -floor()-, added in
> Stata 8, always rounds down to the integer less than or equal to its
> argument. The name "floor" is due to Kenneth E. Iverson
> (1962), the principal architect of APL, who also suggested an
> expressive notation I can't emulate here as I'm font-challenged.
> For further discussion, see Knuth (1997, p.39) or Graham, Knuth and
> Patashnik (1994, Ch.3).
> 
> As it happens, 5 * int(mpg/5) gives exactly the same result
> for -mpg- in the auto data, but in general whenever variables
> may be negative as well as positive,
> 
> interval * floor(expression / interval)
> 
> gives a more consistent classification.
> 
> Let us compare this briefly with other possible solutions.
> -round(mpg, 5)- is different, as this rounds to the nearest
> multiple of 5, which could be either rounding up or rounding down.
> -round(mpg - 2.5, 5)- should be fine, but is also a little too
> much like a dodge.
> 
> With the function -recode()- you need two dodges, say
> -recode(-mpg,-40,-35,-30,-25,-20,-15,-10)-.  Note all the negative
> signs: negating and then negating to reverse it are necessary
> because -recode()- uses its numeric arguments as upper limits,
> i.e. it rounds up.  Naturally, if you want rounding up, that
> is fine.
> 
> -egen, cut()- offers another solution with option call -at(10(5)45)-.
> Being able to specify a numlist is nice, as
> compared with spelling out a comma-separated list, but you
> must also add a limit, here 45, which will not be used; otherwise
> with -at(10(5)40)- your highest class will be missing.
> 
> Yutaka Aoki also suggested to me -mpg - mod(mpg,5)-
> which follows immediately once you see that rounding down
> amounts to subtracting the appropriate remainder. -mod(,)-,
> however, does not offer a correspondingly neat way of rounding up.
> 
> The -floor- solution grows on one, and it has the merit that
> you do not need to spell out all the possible end values, with the
> risk of forgetting or mistyping some. Conversely, -recode()-
> and -egen, cut()- are not restricted to rounding in equal
> intervals and remain useful for more complicated problems.
> 
> Without recapitulating the whole argument insofar as it applies to
> rounding up, -floor()-'s sibling -ceil()- (short for
> ceiling) gives a nice way of rounding up in equal intervals, and
> is easier to work with than expressions based on -int()-.
> 
> So the example given looks like
> 
> gen roundedx = 0.5 * floor(x/0.5)
> gen roundedy = 0.1 * floor(x/0.1)
> 
> if you want rounding down, or the same with -ceil()-
> if you want rounding up, or something with the
> -recode()- function or -egen, cut()- if you want
> unequal intervals.
> 
> tab roundedy roundedx
> 
> then gives the tabulations. You probably want to
> keep variable labels etc. One way to do that
> is to use -copydesc- from SSC.
> 
> Graham, R. L., D. E. Knuth and O. Patashnik. 1994.
> Concrete mathematics: a foundation for computer science.
> Reading, MA: Addison-Wesley.
> 
> Iverson, K. E. 1962. A programming language.
> New York: John Wiley.
> 
> Knuth, D. E. 1997. The art of computer programming: Volume
> 1, Fundamental algorithms. Reading, MA: Addison-Wesley.
> 
> Nick
> [email protected]
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

-- 
---------------------------------------------
Dimitris Christodoulou
Associate Researcher
School for Business and Regional Development
University of Wales, Bangor
Hen Coleg
LL57 2DG Bangor
UK
e-mail: [email protected]
---------------------------------------------
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
References:
- Re: st: two-variable Frequency table
  - From: "Nick Cox" <[email protected]>
Prev by Date: Re: st: mfx compute, predict(pu0) after xtprobit
Next by Date: Re: st: mfx compute, predict(pu0) after xtprobit
Previous by thread: Re: st: two-variable Frequency table
Index(es):
- Date
- Thread