Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Split by calendar year (stset & stsplit)


From   Hind Sbihi <sbihi@interchange.ubc.ca>
To   statalist@hsphsun2.harvard.edu
Subject   st: Split by calendar year (stset & stsplit)
Date   Tue, 18 Sep 2007 14:01:28 -0700 (PDT)

Dear Statalisters

I have send a thread previously on expanding observations when the total number of observations (n) varies. 
Thanks to Joseph Coveney and Maarten Buis, I think I'm closer to the data manipulation that I need to do before further statistical analyses.
In summary 
After generating a dummy variable (fail =1 ) , I -stset- my data ( a multiple-record-per subject where subjects are followed throughout their work history) as follow

stset enddate, id(studyno1) failure(fail) enter(time startdat) exit(time .) scale(365.25)


Then I -stsplit- as follow

stsplit year,at(0(1)max) 

The result that I obtain is not quite what I need STATA to do. Here are the modifications that happened in one of the 14,300 subjects 
Original data (declared as survival-time)

+------------------------------------------------------------------------------------------+
studyno1 startdat          enddate       jobdur~n   _st   _d   _t               _t0 
|------------------------------------------------------------------------------------------|
100091    29 Mar 77      20 Jul 80        1209         1    1     3.3100616    0 
100091    21 Jul 80       09 Jan 81        172          1    1     3.7837098    3.3100616 
100091  .              .               .   0 . . . . 
+------------------------------------------------------------------------------------------+

Transformed data ( after stsplit)

+--------------------------------------------------------------------------------+
studyno1 startdat     enddate      jobdur~n _st _d _t           _t0 
|--------------------------------------------------------------------------------|
100091 29 Mar 77    31 Dec 77   1209          1 0 18             17.240246 
100091 29 Mar 77    31 Dec 78   1209          1 0 19             18 
100091 29 Mar 77    01 Jan 80    1209          1 0 20             19 
100091 29 Mar 77    20 Jul 80     1209          1 1 20.550308 20 
100091 21 Jul 80     31 Dec 80    172            1 0 21              20.550308 
100091 21 Jul 80     09 Jan 81     172            1 1 21.023956  21 
100091 .                 .                   .               0 .   .               . 
+--------------------------------------------------------------------------------+

As one can see, only job end dates are truncated and not the job start dates. 
Also, the splitting does not always end at the end of the calendar year as can be seen in the 3rd record (where enddate = 01 Jan 80 ). 
I fixed the start dates with : 
by studyno1: replace startdat=enddate[_n+1] if jobduration==jobduration[_n-1]
However I don't know how to tackle the scale issue that, I believe, causes the splitting to sometimes end on the January 1st year+1  rather than December 31st year.

Any help would be greatly appreciated.
Thank you for your time 

Hind Sbihi
School of Occupational and Environmental Hygiene
University of British Columbia

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index