Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nicole Johnson <njohnson@researchforaction.org> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: statalist-digest V4 #4323 |

Date |
Tue, 1 Nov 2011 22:48:18 +0000 |

Hi Phil - Thanks so much for the reply - this definitely seems to have done it! I am still getting used to STATA dates so this was very helpful. N Date: Tue, 1 Nov 2011 14:32:10 +1100 From: Phil Clayton <philclayton@internode.on.net> Subject: Re: st: Counting Number of Program Days Hi Nikki, I would do this by reshaping the data to long format. Phil . * enter data . clear . input str10 First str10 Last A10_01_10 A10_05_10 A10_010_10 A10_11_10 First Last A10_01_10 A10_05_10 A10_010~0 A10_11_10 1. "Jane" "Doe" 1 1 . . 2. "John" "Doe" . 1 0 1 3. end . . * clean up variable name/s . * (alternatively you could clean these up after reshaping) . rename A10_010_10 A10_10_10 . list, clean noobs First Last A10_01~0 A10_05~0 A10_10~0 A10_11~0 Jane Doe 1 1 . . John Doe . 1 0 1 . . * reshape to long format . reshape long A, i(First Last) j(datestr) string (note: j = 10_01_10 10_05_10 10_10_10 10_11_10) Data wide -> long - ----------------------------------------------------------------------------- Number of obs. 2 -> 8 Number of variables 6 -> 4 j variable (4 values) -> datestr xij variables: A10_01_10 A10_05_10 ... A10_11_10 -> A - ----------------------------------------------------------------------------- . rename A attended . replace attended=0 if missing(attended) (3 real changes made) . list, clean noobs First Last datestr attended Jane Doe 10_01_10 1 Jane Doe 10_05_10 1 Jane Doe 10_10_10 0 Jane Doe 10_11_10 0 John Doe 10_01_10 0 John Doe 10_05_10 1 John Doe 10_10_10 0 John Doe 10_11_10 1 . . * calculate first and final attendance dates for each person . gen date=date(datestr, "MD20Y") . egen startdate=min(date) if attended, by(First Last) (4 missing values generated) . egen enddate=max(date) if attended, by(First Last) (4 missing values generated) . bysort First Last (startdate): replace startdate=startdate[1] (4 real changes made) . bysort First Last (enddate): replace enddate=enddate[1] (4 real changes made) . format %td date startdate enddate . list, clean noobs First Last datestr attended date startdate enddate Jane Doe 10_01_10 1 01oct2010 01oct2010 05oct2010 Jane Doe 10_05_10 1 05oct2010 01oct2010 05oct2010 Jane Doe 10_11_10 0 11oct2010 01oct2010 05oct2010 Jane Doe 10_10_10 0 10oct2010 01oct2010 05oct2010 John Doe 10_05_10 1 05oct2010 05oct2010 11oct2010 John Doe 10_11_10 1 11oct2010 05oct2010 11oct2010 John Doe 10_10_10 0 10oct2010 05oct2010 11oct2010 John Doe 10_01_10 0 01oct2010 05oct2010 11oct2010 . . * for each date, could that person have attended? . gen byte couldattend=date>=startdate & date<=enddate . . * sum up the possible attendances per person . egen maxpossible=sum(couldattend), by(First Last) . . list, clean noobs First Last datestr attended date startdate enddate coulda~d maxpos~e Jane Doe 10_01_10 1 01oct2010 01oct2010 05oct2010 1 2 Jane Doe 10_05_10 1 05oct2010 01oct2010 05oct2010 1 2 Jane Doe 10_11_10 0 11oct2010 01oct2010 05oct2010 0 2 Jane Doe 10_10_10 0 10oct2010 01oct2010 05oct2010 0 2 John Doe 10_05_10 1 05oct2010 05oct2010 11oct2010 1 3 John Doe 10_11_10 1 11oct2010 05oct2010 11oct2010 1 3 John Doe 10_10_10 0 10oct2010 05oct2010 11oct2010 1 3 John Doe 10_01_10 0 01oct2010 05oct2010 11oct2010 0 3 . . * or instead of the last egen you could just collapse the dataset . collapse (sum) couldattend, by(First Last) . list, clean noobs First Last coulda~d Jane Doe 2 John Doe 3 . On 01/11/2011, at 1:45 PM, Nicole Johnson wrote: > Hi all, > > I have a dataset that is basically set up like an attendance roll book. It has the person's name and then each variable is a date that the program was held. The person has a 1 if they attended that day. It looks like this: > > First Last A10_01_10 A10_05_10 A10_010_10 A10_11_10 > Jane Doe 1 1 . . > John Doe . 1 0 1 > > The records go from October through June, but the program did not meet every day. As noted above, the variable names indicate the date. I was able to use a loop to extract the date of first attendance and last attendance, but I need to now calculate the total number of days the person 'could' have attended the program between their date of first attendance and date of last attendance. SO in the above example I would be able to say that John Doe attended 2 out of 3 possible program days. Of course since the data in my dataset has many more dates, this is much harder! Any help is appreciated. > > I guess I should mention I used the following to calculate some additional variables that may be of use which include string values for date first attended that match the variable names and date values, also the total number of program days. > > Any help is much appreciated - thank you! > Nikki > > ***Macro to find first date of attendance and create string variable 'firstfound' > local first 1 > gen firstfound = "" > foreach v of varlist A10_01_2008-A06_20_2009 { > replace firstfound = "`v'" if `v' == `first' & missing(firstfound) > } > > ***Macro to find last date of attendance and create string variable 'lastfound' > local last 1 > gen lastfound = "" > foreach v of varlist A10_01_2008-A06_20_2009 { > replace lastfound = "`v'" if `v' == `last' > } > > ***Transforming string 'firstfound' into date value first_attend_0809 > . gen firstfound1=substr(firstfound, 2, 10) > . generate first_attend_0809=date(firstfound1,"MDY") > . format first_attend_0809 %td > > ***Transforming string 'lastfound' into date value last_attend_0809 > . gen lastfound1=substr(lastfound, 2, 10) > . generate last_attend_0809=date(lastfound1,"MDY") > . format last_attend_0809 %td > > local start firstfound > gen days_possible = 0 > foreach v of varlist A10_01_2008-A06_20_2009 { > replace days_possible = days_possible+1 > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Counting Number of Program Days [was: RE: RE: statalist-digest V4 #4323]***From:*Nick Cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**RE: st: Time Series Poisson** - Next by Date:
**Re: st: Counting Number of Program Days [was: RE: RE: statalist-digest V4 #4323]** - Previous by thread:
**RE: st: Time Series Poisson** - Next by thread:
**Re: st: Counting Number of Program Days [was: RE: RE: statalist-digest V4 #4323]** - Index(es):