Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: how to summarize while accounting for duplicate values


From   Eric Booth <ebooth@ppri.tamu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: how to summarize while accounting for duplicate values
Date   Fri, 19 Feb 2010 15:21:42 -0600

>

A snippet of your data & some more info about what kinds of summary statistics you want to create would help.  
 
One way to get started is to use the -collapse- command, or you could -egen- the statistic you want for the non-constant variables and then just take a look at the first observation for each individual:


*-------------------------------------BEGIN EXAMPLE
clear
inp year id offspring_born age_first lifespan income
2000 1 0 22 55 20000
2001 1 2 22 55 21005
2002 1 2 22 55 22000
2003 1 2 22 55 23000
2004 1 2 22 55 19000
2000 2 1 18 63 80105
2001 2 1 18 63 90000
2002 2 1 18 63 80000
2003 2 2 18 63 60000
2004 2 2 18 63 50000
2005 2 2 18 63 40000
end

**Create a variable for change in offspring, when it does occur
sort id year
by id: g offspring_change = offspring_born[_N] - offspring_born[1]
bys id: sum offspring_born if offspring_change>0

**summary table using collapse**
preserve
collapse (mean) income (max) offspring_born offspring_change age_first lifespan, by(id)
**save or outsheet this table**
restore

**or 

preserve
collapse (mean) income (max) offspring_born offspring_change age_first lifespan
**save or outsheet this table**
restore

******************

**you could also work with only the first observation for each individual:

bys id: gen i = [_n]==1
egen indiv_income = mean(income), by(id)

sum if i==1
tabstat offspring_born offspring_change age_first lifespan indiv_income if i==1, ///
 stat(mean sd min max n) save 
*-------------------------------------END EXAMPLE

~ Eric

__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
ebooth@ppri.tamu.edu
Office: +979.845.6754
Fax: +979.845.0249
http://ppri.tamu.edu


On Feb 19, 2010, at 2:15 PM, dstahler@ucla.edu wrote:

> I am trying to summarize a dataset in Stata 10.1 on individuals' reproductive success throughout their life. The database is set up with each individual having an annual entry with multiple variables. Some variables' values change annually (e.g. # offspring born), while other variables have duplicated values for each year as they do not change over the course of their life (e.g. age of first reproduction, lifespan). How do I get Stata to provide summary statistics for chosen variables of interest in the population that accounts for individuals' duplicated values for some variables that don't change annually?





*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index