[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
[email protected] |

To |
[email protected] |

Subject |
Re: st: Tabling: an agenda |

Date |
Wed, 08 Oct 2003 09:50:50 -0500 |

The venerable PROC TABULATE in SAS is a good model (at least in terms of functionality). It allows complete specification of the table layout including lines and text, independent format specifications for each stat in the table, and the ability to specify the exact denominator for each percent, inlcuding various row and column subtotals. But it's a bear to learn. --On Wednesday, October 08, 2003 12:33 PM +0100 Nick Cox <[email protected]> wrote:

Phil Ryan mused generally in the light of a question from Daniel Sabath:As I think Nick Cox has pointed out recently, Stata's tabulation facilities are somewhat scattered and it can be difficult to find exactly what you want among the myriad of official and unofficial commands. My own opinion is that, usually, user-written add-ons are a *very* good thing and add immeasurably to Stata's functionality. But tabulation is such a basic and important tool that a more unified system is needed. Many of us have written front-ends to _tabdisp for particular functions that -table- does not support, but (i) _tabdisp itself is limited and (ii) there is no unified <command/subcommand/option> construct to allow a reasonable choice of presentations of tabular material. (One has in mind the v8 graphics subsystem - complex, admittedly, but now allows a deal of control over the end-product). In Dan's example below, what we have is essentially a collection of Rx2 subtables appended, that is, we have a sex X smoker table then an age group X smoker table and then perhaps other subtables. This is often the format given as "Table 1" of a published paper wherein the baseline characteristics of two or more groups are displayed. Stata can produce the subtables, but (I think) not the end-product, because Stata's tables are all about complete cross-classifications,whereas the display we want here has cross-classifications within a subtable but not between subtables. In summary I can imagine a tabulation subsystem in Stata that supports a user-defined output - contents and layout - for presentation. Imagination is, of course, cheap.Imagination is where ideas come from! I agree, as would be expected, with the general diagnosis here. I also agree that at least for certain tabulation tasks the needs go beyond what amateurs can do with Stata's own language, so that we need a major input from Stata Corp. However, in the spirit of Phil's later comments, let's talk specifics. Here is a first PARtial list of a miserable seven Problems, what can be done with Available material and what seems Required. Join in with your own additions (or subtractions). Problem 1: awareness ==================== I think one of the major problems users face is just to be aware of what is possible, given the multiplicity of commands. Available solutions: At some point, there is no substitute for reading the manual and playing with the existing commands, e.g. so that you know the strengths and weaknesses of -tabulate-, -table-, -tabstat-, -tabdisp- etc. (and -list- etc.). Some articles in the Stata Journal aim to provide comparative material. Required solutions: More documentation of various kinds! More FAQs please. Anyone who was willing to write a book on Stata tabulation tasks and tricks would not make the conceptual breakthrough which Deans and Chairs expect, but they would be able to start financing their retirement home. Problem 2: combining tables =========================== As Phil has clearly highlighted, one common need is to put together what in effect sub-tables into combined tables. It could be argued that Stata should not interfere between you and your word and text processor; any way, at first sight it offers next to no tools for doing this. Available solutions: ... except that, in a sense, there is a bunch of commands for joining tables so long as they are (expressible as) Stata matrices. This line of attack is probably under-appreciated; at the same time, it falls short of what I guess people often need here. Required solutions: a whole mini-language for combining tables. In effect tables could be seen as objects and there would be a set of operations for combining them, with tunable control of output form: e.g. join along rows; join along columns; layer. Each combining would produce alignment, and be more than what anybody could do as a cut/copy/paste exercise. I guess that this would be a substantial project for Stata Corp. -graph combine- is a partial analogue. (But there's more, such as elementwise addition, subtraction, multiplication, division of tables...) Problem 3: multiple variables ============================= Stata does not offer much support for tabulating frequency / proportion / percent results from several variables simultaneously. Suppose (e.g.) I have variables on trips to theatre, cinema, opera house, funfair, etc. and I want a single table for all variables so I can compare frequency distributions. Available solutions: Some user efforts. Much can be done once you see that a different data structure is often the key (-stack-, -reshape- etc.), but most users understandably prefer getting results on the fly to mapping to a different data structure. (Even seeing that you need a different structure can depend on a lot of experience. Doing the restructuring can be tricky too.) Required solutions: Stata Corp to take this seriously! Problem 4: sorting ================== Sorting on the margins is often of limited analytical use. To see patterns, rather than to provide easy look-up (what is the population of Texas? Look under "Texas"...), you often need to sort tables on their contents (i.e. cell entries). Available solutions: -tabulate, sort-. Some user efforts. In general, this is not provided very widely. Required solutions: Stata Corp to take this seriously! Problem 5: cell composites ========================== What I call cell composites are cells containing values from two or more variables, whether variables in your dataset or temporary variables constructed by the command running. In Daniel Sabath's example which started this thread, he wanted cells with concatenated strings <cell freq> (<row percent>) This is quite distinct cosmetically from what might be called cell stacks <cell freq> <row percent> In general, Stata directly supports cell stacks, but not output like the first form. Cell stacks can be more space-consuming and difficult to read in some circumstances, although it is also easy to run out of space with the first form. Available solutions: Much is possible once you see that setting tabulation up as a display of string variables is the key. However, this requires some prior manipulations and indeed moderate fluency with some Stata basics. Canned solutions, whether official commands or user-written programs, appear lacking. Required solutions: Support for output specifications, i.e. if I want a table to show <cell freq> (<row percent>) something like "#1 (#2)" would specify "the first number followed by a space followed by a parenthesis followed by the second number followed by a parenthesis". (Naturally there is a danger of reinventing e.g. TeX's tabulation syntax.) Problem 6: cell text ==================== Think of the number of ways in which you might specify substantive missings as one example. Depending on the boss's whims, the house rules, the journal's prescribed style, your own tastes, you could want NA or -- or (no data) etc., etc. This is an example of how, frequently, even in a numeric table, you often want extra text. Or think of cell entries which are footnoted. Available solutions: As with Problem 5, much is possible once you see that setting tabulation up as a display of string variables is the key. However, this requires some prior manipulations and indeed moderate fluency with some Stata basics. Canned solutions, whether official commands or user-written programs, appear lacking. Required solutions: Stata Corp to take this seriously! Problem 7: table design ======================= In fact, we can easily extend this. This last problem is really a rag-bag of all sorts of small and large design issues, such as support for different fonts and bold, italic, etc. different kinds of divider and separator control of titles, subtitles, notes, etc. control of margin layout multiple formats A very simple example of the last is with -tabstat-. If I go . tabstat mpg, by(rep78) s(n mean sd) Summary for variables: mpg by categories of: rep78 (Repair Record 1978) rep78 | N mean sd ---------+------------------------------ 1 | 2 21 4.242641 2 | 8 19.125 3.758324 3 | 30 19.43333 4.141325 4 | 18 21.66667 4.93487 5 | 11 27.36364 8.732385 ---------+------------------------------ Total | 69 21.28986 5.866408 ---------------------------------------- then it's clear that the number of decimal places is silly for mean and sd. Specifying one d.p. is easy . tabstat mpg, by(rep78) s(n mean sd) format(%2.1f) Summary for variables: mpg by categories of: rep78 (Repair Record 1978) rep78 | N mean sd ---------+------------------------------ 1 | 2.0 21.0 4.2 2 | 8.0 19.1 3.8 3 | 30.0 19.4 4.1 4 | 18.0 21.7 4.9 5 | 11.0 27.4 8.7 ---------+------------------------------ Total | 69.0 21.3 5.9 ---------------------------------------- but now the format of N is ill-chosen. And it is common to want yet other formats for other cells: . tabstat mpg, by(rep78) s(n mean sd skew kurt) format(%2.1f) Summary for variables: mpg by categories of: rep78 (Repair Record 1978) rep78 | N mean sd skewness kurtosis ---------+-------------------------------------------------- 1 | 2.0 21.0 4.2 0.0 1.0 2 | 8.0 19.1 3.8 0.2 1.6 3 | 30.0 19.4 4.1 0.4 3.1 4 | 18.0 21.7 4.9 -0.1 2.0 5 | 11.0 27.4 8.7 -0.0 1.6 ---------+-------------------------------------------------- Total | 69.0 21.3 5.9 1.0 4.0 ------------------------------------------------------------ Here one might want 2 d.p. for skew and kurt, at least cosmetically. Available solutions: There is a territorial issue here, as with Problem 2, on how far Stata should get into terrain which normally you would negotiate with (or in some cases without) the assistance of your word or text processing software. A lot can be done with SMCL, but either for one-off tasks or for repetitive tasks that often requires Stata programming or at least considerable Stata expertise. Multiple formats are fairly easy to implement; one example can be seen in -makematrix- from SSC. Required solutions: Mostly, the finger points at Stata Corp, again. But user-programmers can do more here than is sometimes appreciated. Nick [email protected] * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

========================================================================= Paul A. Jargowsky, Ph.D., Assoc. Prof. of Political Economy Director, The Bruton Center, School of Social Sciences (GR 31) University of Texas at Dallas, 2601 North Floyd Road, Richardson TX 75080 ========================================================================= email: [email protected] or [email protected] Home page: http://www.utdallas.edu/~jargo Voice: 972-883-2992; FAX: 972-883-2735 ========================================================================= * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Tabling: an agenda***From:*"Nick Cox" <[email protected]>

- Prev by Date:
**st: bizarre missing observations** - Next by Date:
**st: tests of spatio-temporal clustering** - Previous by thread:
**Re: st: Object oriented help files (and Tabling: an agenda)** - Next by thread:
**st: RE: Duration and panel data** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |