I am a beginner in Stata, and this posting about an improved table command
induces me to bring up what appear to be several shortcomings in the table
producing abilities of Stata. So far -table- comes closest to what I want,
but it does have problems.
I'd like to see in at least one Stata tabulation command the combination
of abilities:
0) cell sums
1) probability weights
2) ability to format output in "thousands" or "millions"
3) count "weighted number of non-zero entries"
4) ratio of cell contents
These are characteristic of tables in, for example the Internal Revenue
Service annual "Statistics of Income". In that volume, the aggregate
amount for each income or deduction item is given in millions of dollars,
together with a count of the number of taxpayers (in thousands) with a
non-zero amount for that item. All this is tabbed by income.
Public use files for this data are available from the SOI division of the
IRS. Since the IRS supplies weights to two decimal places, a workaround
for (1) in the procedures which do not support pweights is to multiply all
the weights by 100 and use iweights instead, this then requires dividing
all the amount variables by 100 and keeping mental tabs on the count
variables (or see below).
A workaround for (2) is to create an additional variable storing the value
over 1,000 or over 1,000,000 for each column in the table, but it would be
nice to be able to specify
cell(sum*millions)
or similar to get the same effect without creating new variables. Of
course one wouldn't usually want the same scaling factor for all
variables, which would be a problem for the syntax of many Stata table
programs (other than -table-).
Since zero is a perfectly valid amount for any item, using the number
non-missing involves creating another variable and is therefore a less
than ideal workaround to (3). Another possibility is to create a dummy of
1 (or .01) for each income item which is one for non-zero entries, and
take the weighted sum of dummies.
I am guessing that probability weights are omitted due to the difficulty
of handling covariances within a complex sampling scheme, however
excluding them must be a problem for many users.
An example of (4) would be to present the average tax rate by income
class, which would be the ratio of the tax aggregate to the income
aggregate. The thing is that the ratio of sums or means is not the same as
the mean of ratios. The ratio of sums is easily calculated with egen (by
income), but at a cost of several lines of code and once the result is
incorporated into a table, the "total" row at the bottom of the table
(calculated by the tab* command) will not be the properly weighted
mean.
I am prepared to be informed that I have missed something, and my concerns
are unwarranted, as I said, I am new to Stata. The -table- command does
have (0) and (1), so I have stuck with it.
Daniel Feenberg
On Mon, 29 Nov 2004, Ian Watson wrote:
> Dear statalist,
>
> Thanks top Kit Baum, a new version of -tabout- is now available from
> the SSC archives. From inside Stata just type: ssc install tabout
> (or: ssc install tabout, replace).
>
> -tabout- produces publication quality tables from Stata, with the
> output exported to a text file. It can be exported as tab-delimited,
> html code or LaTeX/TeX code. -tabout- provides extensive user
> control over formating of data and labels and generates table
> headers automatically.
>
> This version includes:
>
> - frequency tables
> - tables of summary statistics (similar to -table-)
> - tables of percentile ratios
> - chi2 statistics
>
> The earlier version provided the usual cross-tabulations (similar to
> -tabulate-). One feature of -tabout- is the ability to produce
> stacked tables, for example, multiple vertical variables
> cross-tabulated against a horizontal variable.
>
> For complex formating, user "style" files can be specified which
> insert code above and below the tables for greater control over the
> appearance. Arguments can be passed to these files at run-time,
> allowing users to specify titles and so forth for their tables.
>
> This version is a complete rebuild of the earlier -tabout-, with
> considerable changes to the syntax. Users of the former version
> should consult the help file to ensure they now make use of the new
> syntax. A few nasty bugs have been fixed. Sorry to any users who
> have been bitten. Bug reports are, of course, very welcome.
>
> To make learning the syntax easy, some example files are available
> at:
>
> www.acirrt.com/watson/tabout
>
> These illustrate the appearance of the tables as well as the details
> of the syntax used to produce them.
>
> -----------------------------------------------------------------------
>
> -tabout- should meet the needs of two stata listers who recently
> posted queries about producing tables.
>
> Friedrich Huebler <[email protected]> asked how he might
> produce a table in tab delimited form that looked like this:
>
> ===================
> Group Median
> income
> ===================
> Male $XXX
> Female $XXX
> -------------------
> Primary ed. $XXX
> Secondary ed. $XXX
> Higher ed. $XXX
> -------------------
> Urban $XXX
> Rural $XXX
> -------------------
> New Jersey $XXX
> New Mexico $XXX
> New York $XXX
> ===================
> Total $XXX
> ===================
>
> Using -tabout- for this task, the syntax could be as simple as:
>
> tabout sex education geog state using myfile.txt, replace ///
> cells(median income) oneway format(%9.0f)
>
>
> Michael and Rita C. Carlberg <[email protected]> also recently asked how
> they might get a table with the following numerical formatting:
>
> Fraga16 | Freq. Percent Cum.
> ------------+-----------------------------------
> 1 | 137 20 20
> 2 | 353 51 71
> 3 | 92 13 85
> 4 | 71 10 95
> 5 | 20 3 98
> 9 | 10 1 100
> 999 | 3 0 100
> ------------+-----------------------------------
> Total | 686 100
>
>
> Using -tabout- they need only type:
>
> tabout Frag16 using myfile.txt, replace ///
> cells(fcount fper fcum) format(%9.0f %9.0f %9.0f)
>
> And, if they wanted different formatting for each column, they could
> change the format option as required, for example:
> format(%9.0fc %9.1f %9.2f).
>
>
>
>
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/