Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Eric Booth <ebooth@ppri.tamu.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: how to summarize while accounting for duplicate values |

Date |
Fri, 19 Feb 2010 15:21:42 -0600 |

> A snippet of your data & some more info about what kinds of summary statistics you want to create would help. One way to get started is to use the -collapse- command, or you could -egen- the statistic you want for the non-constant variables and then just take a look at the first observation for each individual: *-------------------------------------BEGIN EXAMPLE clear inp year id offspring_born age_first lifespan income 2000 1 0 22 55 20000 2001 1 2 22 55 21005 2002 1 2 22 55 22000 2003 1 2 22 55 23000 2004 1 2 22 55 19000 2000 2 1 18 63 80105 2001 2 1 18 63 90000 2002 2 1 18 63 80000 2003 2 2 18 63 60000 2004 2 2 18 63 50000 2005 2 2 18 63 40000 end **Create a variable for change in offspring, when it does occur sort id year by id: g offspring_change = offspring_born[_N] - offspring_born[1] bys id: sum offspring_born if offspring_change>0 **summary table using collapse** preserve collapse (mean) income (max) offspring_born offspring_change age_first lifespan, by(id) **save or outsheet this table** restore **or preserve collapse (mean) income (max) offspring_born offspring_change age_first lifespan **save or outsheet this table** restore ****************** **you could also work with only the first observation for each individual: bys id: gen i = [_n]==1 egen indiv_income = mean(income), by(id) sum if i==1 tabstat offspring_born offspring_change age_first lifespan indiv_income if i==1, /// stat(mean sd min max n) save *-------------------------------------END EXAMPLE ~ Eric __ Eric A. Booth Public Policy Research Institute Texas A&M University ebooth@ppri.tamu.edu Office: +979.845.6754 Fax: +979.845.0249 http://ppri.tamu.edu On Feb 19, 2010, at 2:15 PM, dstahler@ucla.edu wrote: > I am trying to summarize a dataset in Stata 10.1 on individuals' reproductive success throughout their life. The database is set up with each individual having an annual entry with multiple variables. Some variables' values change annually (e.g. # offspring born), while other variables have duplicated values for each year as they do not change over the course of their life (e.g. age of first reproduction, lifespan). How do I get Stata to provide summary statistics for chosen variables of interest in the population that accounts for individuals' duplicated values for some variables that don't change annually? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: how to summarize while accounting for duplicate values***From:*dstahler@ucla.edu

- Prev by Date:
**Re: st: Inequality of education: ineqdec0?** - Next by Date:
**st: sample by() vs. bysort: sample; and some unexpected ssc install trouble with a Mata library** - Previous by thread:
**st: RE: how to summarize while accounting for duplicate values** - Next by thread:
**Re: st: how to summarize while accounting for duplicate values** - Index(es):