Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Cumulative Frequencies within Groups


From   "Ben Hoen" <bhoen@lbl.gov>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Cumulative Frequencies within Groups
Date   Fri, 3 Aug 2012 17:20:26 -0400

As often happens...preparing the email prompted me to think differently.

Here is the solution:

*=========== Begin ==================

sysuse auto, clear
g levels=(int(4*runiform()))+1 //to create an categorical variable
representing 4 groups
bys levels rep78 : gen freq = _N
bys levels rep78 : gen cumfreq = _N if _n == 1
bys levels: replace cumfreq = sum(cumfreq)
bys levels: tabdisp rep78, cell(freq cumfreq)

*============ End ====================

Ben Hoen
LBNL
Office: 845-758-1896
Cell: 718-812-7589


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Ben Hoen
Sent: Friday, August 03, 2012 4:49 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: Cumulative Frequencies within Groups

Hi all,

I have been unable to figure out how to create cumulative frequencies within
groups.

This Nick Cox entry provided a great description for how to create
cumulative frequencies for a whole set of data:
http://www.stata.com/support/faqs/data-management/tabulating-cumulative-freq
uencies/ 

But I have not been able to apply the same logic to groups within my
dataset.

For the full dataset this code works perfectly (as Nick provided)

*======= Begin ===============

sysuse auto, clear
by rep78, sort: gen freq = _N
by rep78: gen cumfreq = _N if _n == 1
replace cumfreq = sum(cumfreq)
tabdisp rep78, cell(freq cumfreq)

* ========== End ==================

I would like to apply the same logic to groups of cases.  Here is my best
attempt.

* =========== Begin ===============

g levels=(int(4*runiform()))+1 //to create an categorical variable
representing 4 groups

su levels, meanonly

forvalues i = 1/`r(max)' { //max is used because often I will not know the
number of groups
 bys rep78: gen freqtemp=_N if levels==`i'
 by rep78: gen cumfreqtemp=_N if _n==1 & levels==`i'
 replace cumfreqtemp=sum(cumfreqtemp) if levels==`i'
 replace freq=freqtemp if levels==`i'
 replace cumfreq=cumfreqtemp if levels==`i'
 drop freqtemp cumfreqtemp

}
*

bys levels: tabdisp rep78, cell(freq cumfreq)

* =========== End =====================

As one can see, my code has a problem that I cannot discern.

Any ideas?

Thanks, in advance,

Ben


Ben Hoen
Principal Research Associate
Lawrence Berkeley National Laboratory
Office: 845-758-1896
Cell: 718-812-7589
bhoen@lbl.gov
http://eetd.lbl.gov/ea/emp/staff/hoen.html

Visit our publications at: 
http://eetd.lbl.gov/ea/ems/emp-pubs.html

Sign up for our email list to receive publication notifications:
http://eetd.lbl.gov/ea/emp/list/emp_pubs_signup.php




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index