From
Eric Booth <ebooth@ppri.tamu.edu>

To
statalist@hsphsun2.harvard.edu

Subject
Re: st: how to summarize while accounting for duplicate values

Date
Fri, 19 Feb 2010 15:21:42 -0600

> A snippet of your data & some more info about what kinds of summary statistics you want to create would help. One way to get started is to use the -collapse- command, or you could -egen- the statistic you want for the non-constant variables and then just take a look at the first observation for each individual: *-------------------------------------BEGIN EXAMPLE clear inp year id offspring_born age_first lifespan income 2000 1 0 22 55 20000 2001 1 2 22 55 21005 2002 1 2 22 55 22000 2003 1 2 22 55 23000 2004 1 2 22 55 19000 2000 2 1 18 63 80105 2001 2 1 18 63 90000 2002 2 1 18 63 80000 2003 2 2 18 63 60000 2004 2 2 18 63 50000 2005 2 2 18 63 40000 end **Create a variable for change in offspring, when it does occur sort id year by id: g offspring_change = offspring_born[_N] - offspring_born[1] bys id: sum offspring_born if offspring_change>0 **summary table using collapse** preserve collapse (mean) income (max) offspring_born offspring_change age_first lifespan, by(id) **save or outsheet this table** restore **or preserve collapse (mean) income (max) offspring_born offspring_change age_first lifespan **save or outsheet this table** restore ****************** **you could also work with only the first observation for each individual: bys id: gen i = [_n]==1 egen indiv_income = mean(income), by(id) sum if i==1 tabstat offspring_born offspring_change age_first lifespan indiv_income if i==1, /// stat(mean sd min max n) save *-------------------------------------END EXAMPLE ~ Eric __ Eric A. Booth Public Policy Research Institute Texas A&M University ebooth@ppri.tamu.edu Office: +979.845.6754 Fax: +979.845.0249 http://ppri.tamu.edu On Feb 19, 2010, at 2:15 PM, dstahler@ucla.edu wrote: > I am trying to summarize a dataset in Stata 10.1 on individuals' reproductive success throughout their life. The database is set up with each individual having an annual entry with multiple variables. Some variables' values change annually (e.g. # offspring born), while other variables have duplicated values for each year as they do not change over the course of their life (e.g. age of first reproduction, lifespan). How do I get Stata to provide summary statistics for chosen variables of interest in the population that accounts for individuals' duplicated values for some variables that don't change annually? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

