# st: tabulate question

 From jwegelin <[email protected]> To [email protected] Subject st: tabulate question Date Fri, 04 Jan 2008 13:00:15 -0500

```There is probably an obvious answer to this.

Suppose you have a dataset in long form with repeated (say, yearly)
measures on a large number of individuals. The data are *not* balanced:
Some individuals have 10 or more observations (rows); some have only one
observation. You'd like to find out how many individuals are in your
dataset.

In the current example, mytmp.dta contains NAME, the id variable
(labeled numeric), as well as Sex01 (labeled numeric), Age, and Year.

Below are two solutions, neither particularly elegant. Either way, a
couple steps are required to find that there are 3969 individuals in the
dataset. Both ways replace the long dataset with a dataset which is only
useful for counting the number of individuals. Is there a more elegant
solution?

One way *not* to do it is

tabulate NAME

since this causes an inconveniently long table to scroll past which has
a row for each NAME.

One solution is to pick any variable other than NAME (provided the
variable has no missing values) and collapse as follows:

. use mytmp, clear

. summarize NAME

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
NAME |      8282    1995.673    1154.142          1       3969

. collapse (count) nYears=Sex01, by(NAME)

. summarize NAME

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
NAME |      3969        1985    1145.896          1       3969

Another way is to use reshape:

. use mytmp, clear

. keep NAME Age Year Sex01

. reshape wide Age, i(NAME) j(Year)
(note: j = 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                     8282   ->    3969
Number of variables                   4   ->      24
j variable (22 values)             Year   ->   (dropped)
xij variables:
Age   ->   Age1986 Age1987 ... Age2007
-----------------------------------------------------------------------------

. summarize NAME

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
NAME |      3969        1985    1145.896          1       3969

Thanks for any suggestions.

Jacob A. Wegelin
Assistant Professor
Department of Biostatistics
Virginia Commonwealth University
730 East Broad Street Room 3006
P. O. Box 980032
Richmond VA 23298-0032
U.S.A.
http://www.people.vcu.edu/~jwegelin

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```