# RE: st: RE: Computing and allocating time intervals in a widedataset

 From Thomas Speidel <[email protected]> To [email protected] Subject RE: st: RE: Computing and allocating time intervals in a widedataset Date Tue, 16 Jun 2009 10:51:54 -0600

```Nick,
```
My time points are constant within id but do change from one id to another (see event1-event4 in the example). The objective is the same: compute the number of years and allocate them to its corresponding category. However, in this case these time points are life events experienced at different times for each person (id). Hence, you are right on the categorisation, but it should not be consistent to all panels.
```
Thanks.
Thomas Speidel

Quoting Nick Cox <[email protected]> Tue 16 Jun 10:32:12 2009:

```
```I don't see why your time points will vary by id. Isn't the point to
apply a categorisation consistently to all panels?

Nick
[email protected]

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Thomas
Speidel
Sent: 16 June 2009 16:29
To: [email protected]
Subject: Re: st: RE: Computing and allocating time intervals in a
widedataset

Building on Nick's suggestion, I am trying to modify the code to solve
a slighltly different problem.  Suppose again my data is as follows:

id   activity   start   stop   event1   event2   event3   event4
1          1      11     18       10        .       38       44
1          2      21     25       10        .       38       44
1          3      25     28       10        .       38       44
1          4      28     32       10        .       38       44
1          5      32     40       10        .       38       44
1          6      40     44       10        .       38       44
2          1       8     18       13       23        .       30
2          2      23     24       13       23        .       30

Except this time instead of having fixed timepoints (i.e. 0.5 17.5
24.5 44.5 64.5 81), I have to use the "event" variables, so that I can
compute the interval between start and stop and allocate that to its
corresponding event:

id   activity   yr_1   yr_2   yr_3   yr_4
1          1      .      .      .      .
1          2      .      .      .      .
1          3      .      .      .      .
1          4      .      .      .      .
1          5      .      .      .      2
1          6      .      .      .      4
2          1      5      5      .      .
2          2      .      1      .      .

So for example, for (id==2 & activity==1):
yr_1 = min(stop, event1) - max(start, 0.5) = 13 - 8 = 5
yr_2 = min(stop, event2) - max(start, event1) = 23 - 23 = 0 (for
consistency 0 years become 1)

Nick pointed out to tokenize my fixed timepoints in my previous
problem.  However, I suspect since now my timepoints vary by id,
tokenize will not be of use here.

Thanks
Thomas Speidel

Quoting Nick Cox <[email protected]> Tue  9 Jun 10:40:40 2009:

```
```I don't understand the reluctance to -reshape-. I am going to assume
that you do that.

tokenize 0.5 17.5 24.5 44.5 64.5 81
qui forval i = 1/5 {
local j = `i' + 1
gen grp_`i' = max(min(stop, ``j'') - max(start, ``i''), 0) ///
if start < . & stop < .
}
l

Here are the results:

. l

+------------------------------+
| id   activity   start   stop |
|------------------------------|
1. |  1          1       6     15 |
2. |  1          2      22     25 |
3. |  1          3      15     16 |
4. |  1          4      22     28 |
5. |  1          5      30      . |
|------------------------------|
6. |  1          6       .      . |
7. |  2          1      53     69 |
8. |  2          2      69     79 |
+------------------------------+

. tokenize 0.5 17.5 24.5 44.5 64.5 81

. qui forval i = 1/5 {
2.         local j = `i' + 1
3.         gen grp_`i' = max(min(stop, ``j'') - max(start, ``i''),
```
```0)
```
```///
```
```      if start < . & stop < .
```
```  4. }

. l

```
```+----------------------------------------------------------------------+
```
```     | id   activity   start   stop   grp_1   grp_2   grp_3   grp_4
grp_5 |

```
```|----------------------------------------------------------------------|
```
```  1. |  1          1       6     15       9       0       0       0
0 |
2. |  1          2      22     25       0     2.5      .5       0
0 |
3. |  1          3      15     16       1       0       0       0
0 |
4. |  1          4      22     28       0     2.5     3.5       0
0 |
5. |  1          5      30      .       .       .       .       .
. |

```
```|----------------------------------------------------------------------|
```
```  6. |  1          6       .      .       .       .       .       .
. |
7. |  2          1      53     69       0       0       0    11.5
4.5 |
8. |  2          2      69     79       0       0       0       0
10 |

```
```+----------------------------------------------------------------------+
```
```
Nick
[email protected]

Thomas Speidel

I am attempting to compute several time points to calculate the
interval (years) between the start and the end of an activity and to
assign that interval to its relevant age group.  For example, given
the following dataset:

id   activity   start   stop
1          1       6     15
1          2      22     25
1          3      15     16
1          4      22     28
1          5      30      .
1          6       .      .
2          1      53     69
2          2      69     79

I am trying to derive the following:

id   activity   start   stop   grp_0_17   grp_1~24   grp_2~44
grp_4~64   grp_6~81
1          1       6     15          9          0          0
0          0
1          2      22     25          0        2.5         .5
0          0
1          3      15     16          1          0          0
0          0
1          4      22     28          0        2.5        3.5
0          0
1          5      30      .          0          0          1
0          0
1          6       .      .          .          .          .
.          .
2          1      53     69          0          0          0
11.5        4.5
2          2      69     79          0          0          0
0         10

The age groups are:
[0.5, 17.5]
[17.6, 24.5]
[24.6, 44.5]
[44.6, 64.5]
[64.6, 81]

If the dataset was in long format as above, it would not be terribly
hard. To slightly complicate things is the fact that the interval may
need to be correctly allocated when it falls between two or more age
groups.  However, my data is in wide format (single observation per
row) making it a nightmare to even check or troubleshoot my code (I
have 40 activities per id), and the data is so large that I am
reluctant to reshape it.
This is what the dataset above would look like:

id   start1   stop1   start2   stop2   start3   stop3   start4
stop4   start5   stop5   start6   stop6
1        6      15       22      25       15      16       22
28       30       .        .       .
2       53      69       69      79        .       .        .
.        .       .        .       .

-The activities do not necessarily follow a temporal sequence (e.g.
3rd observation on top)
-While the example does not show that, every id has exactly 40
activities, even though many of them may be completing missing.
-Whenever a start is present but its corresponding stop is missing (as
in the 6th obs. on top), it means that at the time of the study the
person was still performing that activity, hence stop would be a
variable called ageref. If start==ageref, then the interval would be
approximated as 1 year.

I would appreciate any feedback on how to best tackle this problem.
```
```
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

```
```

--
Thomas Speidel

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```