[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
jwegelin <[email protected]> |

To |
[email protected] |

Subject |
st: tabulate question |

Date |
Fri, 04 Jan 2008 13:00:15 -0500 |

There is probably an obvious answer to this. Suppose you have a dataset in long form with repeated (say, yearly) measures on a large number of individuals. The data are *not* balanced: Some individuals have 10 or more observations (rows); some have only one observation. You'd like to find out how many individuals are in your dataset. In the current example, mytmp.dta contains NAME, the id variable (labeled numeric), as well as Sex01 (labeled numeric), Age, and Year. Below are two solutions, neither particularly elegant. Either way, a couple steps are required to find that there are 3969 individuals in the dataset. Both ways replace the long dataset with a dataset which is only useful for counting the number of individuals. Is there a more elegant solution? One way *not* to do it is tabulate NAME since this causes an inconveniently long table to scroll past which has a row for each NAME. One solution is to pick any variable other than NAME (provided the variable has no missing values) and collapse as follows: . use mytmp, clear . summarize NAME Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- NAME | 8282 1995.673 1154.142 1 3969 . collapse (count) nYears=Sex01, by(NAME) . summarize NAME Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- NAME | 3969 1985 1145.896 1 3969 Another way is to use reshape: . use mytmp, clear . keep NAME Age Year Sex01 . reshape wide Age, i(NAME) j(Year) (note: j = 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007) Data long -> wide ----------------------------------------------------------------------------- Number of obs. 8282 -> 3969 Number of variables 4 -> 24 j variable (22 values) Year -> (dropped) xij variables: Age -> Age1986 Age1987 ... Age2007 ----------------------------------------------------------------------------- . summarize NAME Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- NAME | 3969 1985 1145.896 1 3969 Thanks for any suggestions. Jacob A. Wegelin Assistant Professor Department of Biostatistics Virginia Commonwealth University 730 East Broad Street Room 3006 P. O. Box 980032 Richmond VA 23298-0032 U.S.A. http://www.people.vcu.edu/~jwegelin * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: RE: tabulate question***From:*"Mostafa Baladi" <[email protected]>

- Prev by Date:
**Re: st: clinical trials** - Next by Date:
**st: Re:groups of pairs of variables** - Previous by thread:
**st: Dynamic Tobit with endogenous regressors?** - Next by thread:
**st: RE: tabulate question** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |