[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Thomas Speidel <thomas@tmbx.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
RE: st: RE: Computing and allocating time intervals in a widedataset |

Date |
Tue, 16 Jun 2009 10:51:54 -0600 |

Nick,

Thanks. Thomas Speidel Quoting Nick Cox <n.j.cox@durham.ac.uk> Tue 16 Jun 10:32:12 2009:

I don't see why your time points will vary by id. Isn't the point to apply a categorisation consistently to all panels? Nick n.j.cox@durham.ac.uk -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Thomas Speidel Sent: 16 June 2009 16:29 To: statalist@hsphsun2.harvard.edu Subject: Re: st: RE: Computing and allocating time intervals in a widedataset Building on Nick's suggestion, I am trying to modify the code to solve a slighltly different problem. Suppose again my data is as follows: id activity start stop event1 event2 event3 event4 1 1 11 18 10 . 38 44 1 2 21 25 10 . 38 44 1 3 25 28 10 . 38 44 1 4 28 32 10 . 38 44 1 5 32 40 10 . 38 44 1 6 40 44 10 . 38 44 2 1 8 18 13 23 . 30 2 2 23 24 13 23 . 30 Except this time instead of having fixed timepoints (i.e. 0.5 17.5 24.5 44.5 64.5 81), I have to use the "event" variables, so that I can compute the interval between start and stop and allocate that to its corresponding event: id activity yr_1 yr_2 yr_3 yr_4 1 1 . . . . 1 2 . . . . 1 3 . . . . 1 4 . . . . 1 5 . . . 2 1 6 . . . 4 2 1 5 5 . . 2 2 . 1 . . So for example, for (id==2 & activity==1): yr_1 = min(stop, event1) - max(start, 0.5) = 13 - 8 = 5 yr_2 = min(stop, event2) - max(start, event1) = 23 - 23 = 0 (for consistency 0 years become 1) Nick pointed out to tokenize my fixed timepoints in my previous problem. However, I suspect since now my timepoints vary by id, tokenize will not be of use here. Thanks Thomas Speidel Quoting Nick Cox <n.j.cox@durham.ac.uk> Tue 9 Jun 10:40:40 2009:I don't understand the reluctance to -reshape-. I am going to assume that you do that. Your example suggests as code tokenize 0.5 17.5 24.5 44.5 64.5 81 qui forval i = 1/5 { local j = `i' + 1 gen grp_`i' = max(min(stop, ``j'') - max(start, ``i''), 0) /// if start < . & stop < . } l Here are the results: . l +------------------------------+ | id activity start stop | |------------------------------| 1. | 1 1 6 15 | 2. | 1 2 22 25 | 3. | 1 3 15 16 | 4. | 1 4 22 28 | 5. | 1 5 30 . | |------------------------------| 6. | 1 6 . . | 7. | 2 1 53 69 | 8. | 2 2 69 79 | +------------------------------+ . tokenize 0.5 17.5 24.5 44.5 64.5 81 . qui forval i = 1/5 { 2. local j = `i' + 1 3. gen grp_`i' = max(min(stop, ``j'') - max(start, ``i''),0)///if start < . & stop < .4. } . l+----------------------------------------------------------------------+| id activity start stop grp_1 grp_2 grp_3 grp_4 grp_5 ||----------------------------------------------------------------------|1. | 1 1 6 15 9 0 0 0 0 | 2. | 1 2 22 25 0 2.5 .5 0 0 | 3. | 1 3 15 16 1 0 0 0 0 | 4. | 1 4 22 28 0 2.5 3.5 0 0 | 5. | 1 5 30 . . . . . . ||----------------------------------------------------------------------|6. | 1 6 . . . . . . . | 7. | 2 1 53 69 0 0 0 11.5 4.5 | 8. | 2 2 69 79 0 0 0 0 10 |+----------------------------------------------------------------------+Nick n.j.cox@durham.ac.uk Thomas Speidel I am attempting to compute several time points to calculate the interval (years) between the start and the end of an activity and to assign that interval to its relevant age group. For example, given the following dataset: id activity start stop 1 1 6 15 1 2 22 25 1 3 15 16 1 4 22 28 1 5 30 . 1 6 . . 2 1 53 69 2 2 69 79 I am trying to derive the following: id activity start stop grp_0_17 grp_1~24 grp_2~44 grp_4~64 grp_6~81 1 1 6 15 9 0 0 0 0 1 2 22 25 0 2.5 .5 0 0 1 3 15 16 1 0 0 0 0 1 4 22 28 0 2.5 3.5 0 0 1 5 30 . 0 0 1 0 0 1 6 . . . . . . . 2 1 53 69 0 0 0 11.5 4.5 2 2 69 79 0 0 0 0 10 The age groups are: [0.5, 17.5] [17.6, 24.5] [24.6, 44.5] [44.6, 64.5] [64.6, 81] If the dataset was in long format as above, it would not be terribly hard. To slightly complicate things is the fact that the interval may need to be correctly allocated when it falls between two or more age groups. However, my data is in wide format (single observation per row) making it a nightmare to even check or troubleshoot my code (I have 40 activities per id), and the data is so large that I am reluctant to reshape it. This is what the dataset above would look like: id start1 stop1 start2 stop2 start3 stop3 start4 stop4 start5 stop5 start6 stop6 1 6 15 22 25 15 16 22 28 30 . . . 2 53 69 69 79 . . . . . . . . -The activities do not necessarily follow a temporal sequence (e.g. 3rd observation on top) -While the example does not show that, every id has exactly 40 activities, even though many of them may be completing missing. -Whenever a start is present but its corresponding stop is missing (as in the 6th obs. on top), it means that at the time of the study the person was still performing that activity, hence stop would be a variable called ageref. If start==ageref, then the interval would be approximated as 1 year. I would appreciate any feedback on how to best tackle this problem.* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

-- Thomas Speidel * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Computing and allocating time intervals in a wide dataset***From:*Thomas Speidel <thomas@tmbx.com>

**st: RE: Computing and allocating time intervals in a wide dataset***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st: RE: Computing and allocating time intervals in a wide dataset***From:*Thomas Speidel <thomas@tmbx.com>

**RE: st: RE: Computing and allocating time intervals in a widedataset***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**RE: st: RE: Computing and allocating time intervals in a widedataset** - Next by Date:
**Re: st: growth curve model with weights** - Previous by thread:
**RE: st: RE: Computing and allocating time intervals in a widedataset** - Next by thread:
**st: SUG Australia and New Zealand 2009** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |