Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: statalist-digest V4 #4323


From   Nicole Johnson <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: RE: statalist-digest V4 #4323
Date   Tue, 1 Nov 2011 22:48:18 +0000

Hi Phil - 

Thanks so much for the reply - this definitely seems to have done it! I am still getting used to STATA dates so this was very helpful. 

N

Date: Tue, 1 Nov 2011 14:32:10 +1100
From: Phil Clayton <[email protected]>
Subject: Re: st: Counting Number of Program Days

Hi Nikki,

I would do this by reshaping the data to long format.

Phil


. * enter data
. clear

. input str10 First str10 Last A10_01_10 A10_05_10 A10_010_10 A10_11_10

          First        Last  A10_01_10  A10_05_10  A10_010~0  A10_11_10
  1. "Jane" "Doe" 1 1 . .
  2. "John" "Doe" . 1 0 1
  3. end

. 
. * clean up variable name/s
. * (alternatively you could clean these up after reshaping)
. rename A10_010_10 A10_10_10

. list, clean noobs

    First   Last   A10_01~0   A10_05~0   A10_10~0   A10_11~0  
     Jane    Doe          1          1          .          .  
     John    Doe          .          1          0          1  

. 
. * reshape to long format
. reshape long A, i(First Last) j(datestr) string
(note: j = 10_01_10 10_05_10 10_10_10 10_11_10)

Data                               wide   ->   long
- -----------------------------------------------------------------------------
Number of obs.                        2   ->       8
Number of variables                   6   ->       4
j variable (4 values)                     ->   datestr
xij variables:
      A10_01_10 A10_05_10 ... A10_11_10   ->   A
- -----------------------------------------------------------------------------

. rename A attended

. replace attended=0 if missing(attended)
(3 real changes made)

. list, clean noobs

    First   Last    datestr   attended  
     Jane    Doe   10_01_10          1  
     Jane    Doe   10_05_10          1  
     Jane    Doe   10_10_10          0  
     Jane    Doe   10_11_10          0  
     John    Doe   10_01_10          0  
     John    Doe   10_05_10          1  
     John    Doe   10_10_10          0  
     John    Doe   10_11_10          1  

. 
. * calculate first and final attendance dates for each person
. gen date=date(datestr, "MD20Y")

. egen startdate=min(date) if attended, by(First Last)
(4 missing values generated)

. egen enddate=max(date) if attended, by(First Last)
(4 missing values generated)

. bysort First Last (startdate): replace startdate=startdate[1]
(4 real changes made)

. bysort First Last (enddate): replace enddate=enddate[1]
(4 real changes made)

. format %td date startdate enddate 

. list, clean noobs

    First   Last    datestr   attended        date   startdate     enddate  
     Jane    Doe   10_01_10          1   01oct2010   01oct2010   05oct2010  
     Jane    Doe   10_05_10          1   05oct2010   01oct2010   05oct2010  
     Jane    Doe   10_11_10          0   11oct2010   01oct2010   05oct2010  
     Jane    Doe   10_10_10          0   10oct2010   01oct2010   05oct2010  
     John    Doe   10_05_10          1   05oct2010   05oct2010   11oct2010  
     John    Doe   10_11_10          1   11oct2010   05oct2010   11oct2010  
     John    Doe   10_10_10          0   10oct2010   05oct2010   11oct2010  
     John    Doe   10_01_10          0   01oct2010   05oct2010   11oct2010  

. 
. * for each date, could that person have attended?
. gen byte couldattend=date>=startdate & date<=enddate

. 
. * sum up the possible attendances per person
. egen maxpossible=sum(couldattend), by(First Last)

. 
. list, clean noobs

    First   Last    datestr   attended        date   startdate     enddate   coulda~d   maxpos~e  
     Jane    Doe   10_01_10          1   01oct2010   01oct2010   05oct2010          1          2  
     Jane    Doe   10_05_10          1   05oct2010   01oct2010   05oct2010          1          2  
     Jane    Doe   10_11_10          0   11oct2010   01oct2010   05oct2010          0          2  
     Jane    Doe   10_10_10          0   10oct2010   01oct2010   05oct2010          0          2  
     John    Doe   10_05_10          1   05oct2010   05oct2010   11oct2010          1          3  
     John    Doe   10_11_10          1   11oct2010   05oct2010   11oct2010          1          3  
     John    Doe   10_10_10          0   10oct2010   05oct2010   11oct2010          1          3  
     John    Doe   10_01_10          0   01oct2010   05oct2010   11oct2010          0          3  

. 
. * or instead of the last egen you could just collapse the dataset
. collapse (sum) couldattend, by(First Last)

. list, clean noobs

    First   Last   coulda~d  
     Jane    Doe          2  
     John    Doe          3  

. 



On 01/11/2011, at 1:45 PM, Nicole Johnson wrote:

> Hi all,
> 
> I have a dataset that is basically set up like an attendance roll book. It has the person's name and then each variable is a date that the program was held. The person has a 1 if they attended that day. It looks like this:
> 
> First                       Last                        A10_01_10          A10_05_10          A10_010_10       A10_11_10
> Jane                       Doe                        1                             1                              .                               .
> John                      Doe                        .                               1                              0                              1
> 
> The records go from October through June, but the program did not meet every day. As noted above, the variable names indicate the date. I was able to use a loop to extract the date of first attendance and last attendance, but I need to now calculate the total number of days the person 'could' have attended the program between their date of first attendance and date of last attendance.  SO in the above example I would be able to say that John Doe attended 2 out of 3 possible program days. Of course since the data in my dataset has many more dates, this is much harder! Any help is appreciated. 
> 
> I guess I should mention I used the following to calculate some additional variables that may be of use which include string values for date first attended that match the variable names and date values, also the total number of program days. 
> 
> Any help is much appreciated - thank you!
> Nikki
> 
> ***Macro to find first date of attendance and create string variable 'firstfound'
> local first 1
> gen firstfound = ""
> foreach v of varlist A10_01_2008-A06_20_2009 {
>                replace firstfound = "`v'" if `v' == `first' & missing(firstfound)
> }
> 
> ***Macro to find last date of attendance and create string variable 'lastfound'
> local last 1
> gen lastfound = ""
> foreach v of varlist A10_01_2008-A06_20_2009 {
>                replace lastfound = "`v'" if `v' == `last'
> }
> 
> ***Transforming string 'firstfound' into date value first_attend_0809
> . gen firstfound1=substr(firstfound, 2, 10)
> . generate first_attend_0809=date(firstfound1,"MDY")
> . format first_attend_0809 %td
> 
> ***Transforming string 'lastfound' into date value last_attend_0809
> . gen lastfound1=substr(lastfound, 2, 10)
> . generate last_attend_0809=date(lastfound1,"MDY")
> . format last_attend_0809 %td
> 
> local start firstfound
> gen days_possible = 0
> foreach v of varlist A10_01_2008-A06_20_2009 {
>                replace days_possible = days_possible+1
>


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index