Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: how to summarize while accounting for duplicate values

 From Eric Booth To statalist@hsphsun2.harvard.edu Subject Re: st: how to summarize while accounting for duplicate values Date Fri, 19 Feb 2010 15:21:42 -0600

```>

One way to get started is to use the -collapse- command, or you could -egen- the statistic you want for the non-constant variables and then just take a look at the first observation for each individual:

*-------------------------------------BEGIN EXAMPLE
clear
inp year id offspring_born age_first lifespan income
2000 1 0 22 55 20000
2001 1 2 22 55 21005
2002 1 2 22 55 22000
2003 1 2 22 55 23000
2004 1 2 22 55 19000
2000 2 1 18 63 80105
2001 2 1 18 63 90000
2002 2 1 18 63 80000
2003 2 2 18 63 60000
2004 2 2 18 63 50000
2005 2 2 18 63 40000
end

**Create a variable for change in offspring, when it does occur
sort id year
by id: g offspring_change = offspring_born[_N] - offspring_born[1]
bys id: sum offspring_born if offspring_change>0

**summary table using collapse**
preserve
collapse (mean) income (max) offspring_born offspring_change age_first lifespan, by(id)
**save or outsheet this table**
restore

**or

preserve
collapse (mean) income (max) offspring_born offspring_change age_first lifespan
**save or outsheet this table**
restore

******************

**you could also work with only the first observation for each individual:

bys id: gen i = [_n]==1
egen indiv_income = mean(income), by(id)

sum if i==1
tabstat offspring_born offspring_change age_first lifespan indiv_income if i==1, ///
stat(mean sd min max n) save
*-------------------------------------END EXAMPLE

~ Eric

__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
ebooth@ppri.tamu.edu
Office: +979.845.6754
Fax: +979.845.0249
http://ppri.tamu.edu

On Feb 19, 2010, at 2:15 PM, dstahler@ucla.edu wrote:

> I am trying to summarize a dataset in Stata 10.1 on individuals' reproductive success throughout their life. The database is set up with each individual having an annual entry with multiple variables. Some variables' values change annually (e.g. # offspring born), while other variables have duplicated values for each year as they do not change over the course of their life (e.g. age of first reproduction, lifespan). How do I get Stata to provide summary statistics for chosen variables of interest in the population that accounts for individuals' duplicated values for some variables that don't change annually?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```