Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: -tabcount- updated on SSC


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: -tabcount- updated on SSC
Date   Sun, 22 Jun 2003 17:32:37 +0100

Thanks to Kit Baum, the -tabcount- program
on SSC has been (substantially) updated.
-tabcount- now requires Stata 8. The previous
version, which required Stata 7, remains
in the same -tabcount- package, as files
-tabcount7.ado- and -tabcount7.hlp-.

The original rationale for -tabcount- was
tackling a problem also tackled by various other
user-written programs, each of which is somewhat
partial. -tabcount- not only supersedes its
previous version: it also supersedes a previous program
of mine, -tabzero-, posted on Statalist
(but never posted to SSC); and does
some of what is done by another program of
mine, -tabcond- (SSC), and some of what
is done by yet another program, -tabvalues-
by Shannon Driver and friends (Statalist only).
And that's unlikely to be an exhaustive list.

There is a little pattern here of people
posting queries on Statalist, and of programmers
writing something which usually fixed _their_
particular version of the problem, but often
not much more (hence some diffidence on
occasion over archiving the program).

The problem is, as someone waggishly remarked
on Statalist, that metaphysics is not
Stata's strong suit. It shows a marked
disinclination to tabulate values which
_might_ exist, but which don't happen
to exist in your data. To put it another
way, there are occasions when you want
to add extra rows, columns, etc. to
a table of frequencies even if they contain only zeros.
Showing them explicitly may be, to you,
part of showing the structure of the data. This is
what -tabcount- offers.

(If this already seems not interesting
or useful, know that that was the main headline,
so you could bail out now.)

To rehearse the elementaries,

. sysuse auto, clear
(1978 Automobile Data)

. tab for rep78

           |                   Repair Record 1978
  Car type |         1          2          3          4          5 |
Total
-----------+-------------------------------------------------------+--
--------
  Domestic |         2          8         27          9          2 |
48
   Foreign |         0          0          3          9          9 |
21
-----------+-------------------------------------------------------+--
--------
     Total |         2          8         30         18         11 |
69

-tabulate- (and indeed also -table-) will put zeros in holes
so long as there are non-zeros in the same row, column, etc. (There
is a difference here in that -tabulate- shows a 0, -table- shows a
blank.)

However, consider this:

. bysort for : tab rep78

______________________________________________________________________
_________
-> foreign = Domestic

     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        4.17        4.17
          2 |          8       16.67       20.83
          3 |         27       56.25       77.08
          4 |          9       18.75       95.83
          5 |          2        4.17      100.00
------------+-----------------------------------
      Total |         48      100.00

______________________________________________________________________
_________
-> foreign = Foreign

     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          3 |          3       14.29       14.29
          4 |          9       42.86       57.14
          5 |          9       42.86      100.00
------------+-----------------------------------
      Total |         21      100.00

The second table has no rows for 1 and 2: Stata
is not reporting what doesn't exist. Some users,
at least some of the time, would prefer to see lines for
Foreign cars and Repair Record of 1 and 2 with explicit
zeros here. (It is a presentational detail whether
they are literal zeros or blanks, and, to
anticipate, -tabcount- lets you choose.)

Here is the -tabcount- way. You can spell out
the values 1/5 as what you want shown:

. bysort for : tabcount rep78, v(1/5)

______________________________________________________________________
_________
-> foreign = Domestic

----------------------
Repair    |
Record    |
1978      |      Freq.
----------+-----------
        1 |          2
        2 |          8
        3 |         27
        4 |          9
        5 |          2
----------------------

______________________________________________________________________
_________
-> foreign = Foreign

----------------------
Repair    |
Record    |
1978      |      Freq.
----------+-----------
        1 |
        2 |
        3 |          3
        4 |          9
        5 |          9
----------------------

and as mentioned the zeros displayed can be literal:

. bysort for : tabcount rep78, v(1/5) zero

< snip>

______________________________________________________________________
_________
-> foreign = Foreign

----------------------
Repair    |
Record    |
1978      |      Freq.
----------+-----------
        1 |          0
        2 |          0
        3 |          3
        4 |          9
        5 |          9
----------------------

That's not a very exciting example. But if this
desire has bitten you, you will be able to
provide your own. Let's imagine a dataset in
which number of children per mother is a variable,
so that even in a very large dataset the tail
may be rather straggly. With -tabulate- or
-table- our output might end something like
this:

       11 |          9
       13 |          3
       15 |          1
       16 |          1
----------------------

If we want the complete set of rows,
-tabcount- with the option -v(1/16)- will suffice

       11 |          9
       12 |
       13 |          3
	 14 |
       15 |          1
       16 |          1
----------------------

You can also specify sets of _c_onditions: a condition is
an inequality or a value (and if a value, an
equality):

. bysort for : tabcount rep78, c(<=2 3 4 5) zero

______________________________________________________________________
_________
-> foreign = Domestic

----------------------
Repair    |
Record    |
1978      |      Freq.
----------+-----------
      <=2 |         10
        3 |         27
        4 |          9
        5 |          2
----------------------

______________________________________________________________________
_________
-> foreign = Foreign

----------------------
Repair    |
Record    |
1978      |      Freq.
----------+-----------
      <=2 |          0
        3 |          3
        4 |          9
        5 |          9
----------------------

That is, -c(<=2 3 4 5)- defines categories
<=2, (equal to) 3, (equal to) 4, (equal to) 5.
Incidentally, there is no rule that
conditions must be mutually exclusive.

However, there is no syntax for specifying
intervals with two limits. You must create
the coarsened variable(s) yourself beforehand.

With two or more variables, you must specify
either a values option or a condition option for
each variable, and they are tagged with
1, 2, etc. according to which variable
is being referred to:

. tabcount foreign rep78, v1(0 1) c2(<=2 3 4 5)

----------------------------------
          |   Repair Record 1978
 Car type |  <=2     3     4     5
----------+-----------------------
 Domestic |   10    27     9     2
  Foreign |          3     9     9
----------------------------------

Seven variables is the limit.

-tabcount- on the other hand is limited:
it won't show you percents, cumulative
percents, cumulative frequencies or indeed
anything else apart from the frequencies.
(As analytic weights are allowed, it is
a little more general than just a counting
program.) There is less of a limitation
than appears at first sight, because
you may -replace- the dataset in memory
with a reduced dataset:

. tabcount foreign rep78, v1(0 1) v2(1/5) replace

----------------------------------------
          |      Repair Record 1978
 Car type |    1     2     3     4     5
----------+-----------------------------
 Domestic |    2     8    27     9     2
  Foreign |                3     9     9
----------------------------------------

. l

     +--------------------------+
     | _freq    foreign   rep78 |
     |--------------------------|
  1. |     2   Domestic       1 |
  2. |     0    Foreign       1 |
  3. |     8   Domestic       2 |
  4. |     0    Foreign       2 |
  5. |    27   Domestic       3 |
     |--------------------------|
  6. |     3    Foreign       3 |
  7. |     9   Domestic       4 |
  8. |     9    Foreign       4 |
  9. |     2   Domestic       5 |
 10. |     9    Foreign       5 |
     +--------------------------+

Actually, the existing official command -contract-
could do that for you in this case, but -tabcount- is a bit
more general than that, in its support for
analytic weights, for values which might not
exist in the data, and for conditions (in the
sense above).

Some might prefer to export this to some
other application, or in Stata the new reduced
dataset can then be the basis for all sorts of
customised tables. Let's suppose we want cumulative frequencies
and cumulative percents in our table:

. bysort foreign (rep78) : gen cufreq = sum(_freq)

. by foreign: gen cupc = 100 * cufreq / cufreq[_N]

. tabdisp foreign rep78, c(cufreq cupc)

------------------------------------------------------------
          |                Repair Record 1978
 Car type |        1         2         3         4         5
----------+-------------------------------------------------
 Domestic |        2        10        37        46        48
          | 4.166667  20.83333  77.08334  95.83334       100
          |
  Foreign |        0         0         3        12        21
          |        0         0  14.28571  57.14286       100
------------------------------------------------------------

and you can control several details of presentation:

. tabdisp foreign rep78, c(cufreq cupc) format(%2.1f)

---------------------------------------------
          |        Repair Record 1978
 Car type |     1      2      3      4      5
----------+----------------------------------
 Domestic |   2.0   10.0   37.0   46.0   48.0
          |   4.2   20.8   77.1   95.8  100.0
          |
  Foreign |   0.0    0.0    3.0   12.0   21.0
          |   0.0    0.0   14.3   57.1  100.0
---------------------------------------------

Again, you could do something like this
already with -table-, but -tabcount- is
here in a sense a bit more general.

There is more explanation of some other features,
including saving one- and two-way tables of frequencies
to matrices, and more examples in the help file.

All of this is based on the official command -tabdisp-
in a double sense:

* -tabcount- calculates the frequencies before handing
them to -tabdisp- for display.

* -tabcount, replace- provides a starting point
for subsequent customised tabulations, again typically
with -tabdisp-.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index