Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: Computing and allocating time intervals in a widedataset


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: Computing and allocating time intervals in a widedataset
Date   Tue, 16 Jun 2009 17:32:12 +0100

I don't see why your time points will vary by id. Isn't the point to
apply a categorisation consistently to all panels? 

Nick 
n.j.cox@durham.ac.uk 

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Thomas
Speidel
Sent: 16 June 2009 16:29
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: RE: Computing and allocating time intervals in a
widedataset

Building on Nick's suggestion, I am trying to modify the code to solve  
a slighltly different problem.  Suppose again my data is as follows:

     id   activity   start   stop   event1   event2   event3   event4
      1          1      11     18       10        .       38       44
      1          2      21     25       10        .       38       44
      1          3      25     28       10        .       38       44
      1          4      28     32       10        .       38       44
      1          5      32     40       10        .       38       44
      1          6      40     44       10        .       38       44
      2          1       8     18       13       23        .       30
      2          2      23     24       13       23        .       30

Except this time instead of having fixed timepoints (i.e. 0.5 17.5  
24.5 44.5 64.5 81), I have to use the "event" variables, so that I can  
compute the interval between start and stop and allocate that to its  
corresponding event:

     id   activity   yr_1   yr_2   yr_3   yr_4
      1          1      .      .      .      .
      1          2      .      .      .      .
      1          3      .      .      .      .
      1          4      .      .      .      .
      1          5      .      .      .      2
      1          6      .      .      .      4
      2          1      5      5      .      .
      2          2      .      1      .      .

So for example, for (id==2 & activity==1):
yr_1 = min(stop, event1) - max(start, 0.5) = 13 - 8 = 5
yr_2 = min(stop, event2) - max(start, event1) = 23 - 23 = 0 (for  
consistency 0 years become 1)

Nick pointed out to tokenize my fixed timepoints in my previous  
problem.  However, I suspect since now my timepoints vary by id,  
tokenize will not be of use here.

Thanks
Thomas Speidel


Quoting Nick Cox <n.j.cox@durham.ac.uk> Tue  9 Jun 10:40:40 2009:

> I don't understand the reluctance to -reshape-. I am going to assume
> that you do that.
>
> Your example suggests as code
>
> tokenize 0.5 17.5 24.5 44.5 64.5 81
> qui forval i = 1/5 {
> 	local j = `i' + 1
> 	gen grp_`i' = max(min(stop, ``j'') - max(start, ``i''), 0) ///
>       if start < . & stop < .
> }
> l
>
> Here are the results:
>
> . l
>
>      +------------------------------+
>      | id   activity   start   stop |
>      |------------------------------|
>   1. |  1          1       6     15 |
>   2. |  1          2      22     25 |
>   3. |  1          3      15     16 |
>   4. |  1          4      22     28 |
>   5. |  1          5      30      . |
>      |------------------------------|
>   6. |  1          6       .      . |
>   7. |  2          1      53     69 |
>   8. |  2          2      69     79 |
>      +------------------------------+
>
> . tokenize 0.5 17.5 24.5 44.5 64.5 81
>
> . qui forval i = 1/5 {
>   2.         local j = `i' + 1
>   3.         gen grp_`i' = max(min(stop, ``j'') - max(start, ``i''),
0)
> ///
>>       if start < . & stop < .
>   4. }
>
> . l
>
>
>
+----------------------------------------------------------------------+
>      | id   activity   start   stop   grp_1   grp_2   grp_3   grp_4
> grp_5 |
>
>
|----------------------------------------------------------------------|
>   1. |  1          1       6     15       9       0       0       0
> 0 |
>   2. |  1          2      22     25       0     2.5      .5       0
> 0 |
>   3. |  1          3      15     16       1       0       0       0
> 0 |
>   4. |  1          4      22     28       0     2.5     3.5       0
> 0 |
>   5. |  1          5      30      .       .       .       .       .
> . |
>
>
|----------------------------------------------------------------------|
>   6. |  1          6       .      .       .       .       .       .
> . |
>   7. |  2          1      53     69       0       0       0    11.5
> 4.5 |
>   8. |  2          2      69     79       0       0       0       0
> 10 |
>
>
+----------------------------------------------------------------------+
>
> Nick
> n.j.cox@durham.ac.uk
>
> Thomas Speidel
>
> I am attempting to compute several time points to calculate the
> interval (years) between the start and the end of an activity and to
> assign that interval to its relevant age group.  For example, given
> the following dataset:
>
>      id   activity   start   stop
>       1          1       6     15
>       1          2      22     25
>       1          3      15     16
>       1          4      22     28
>       1          5      30      .
>       1          6       .      .
>       2          1      53     69
>       2          2      69     79
>
> I am trying to derive the following:
>
>      id   activity   start   stop   grp_0_17   grp_1~24   grp_2~44
> grp_4~64   grp_6~81
>       1          1       6     15          9          0          0
>       0          0
>       1          2      22     25          0        2.5         .5
>       0          0
>       1          3      15     16          1          0          0
>       0          0
>       1          4      22     28          0        2.5        3.5
>       0          0
>       1          5      30      .          0          0          1
>       0          0
>       1          6       .      .          .          .          .
>       .          .
>       2          1      53     69          0          0          0
>    11.5        4.5
>       2          2      69     79          0          0          0
>       0         10
>
> The age groups are:
> [0.5, 17.5]
> [17.6, 24.5]
> [24.6, 44.5]
> [44.6, 64.5]
> [64.6, 81]
>
> If the dataset was in long format as above, it would not be terribly
> hard. To slightly complicate things is the fact that the interval may
> need to be correctly allocated when it falls between two or more age
> groups.  However, my data is in wide format (single observation per
> row) making it a nightmare to even check or troubleshoot my code (I
> have 40 activities per id), and the data is so large that I am
> reluctant to reshape it.
> This is what the dataset above would look like:
>
>      id   start1   stop1   start2   stop2   start3   stop3   start4
> stop4   start5   stop5   start6   stop6
>       1        6      15       22      25       15      16       22
>    28       30       .        .       .
>       2       53      69       69      79        .       .        .
>     .        .       .        .       .
>
> -The activities do not necessarily follow a temporal sequence (e.g.
> 3rd observation on top)
> -While the example does not show that, every id has exactly 40
> activities, even though many of them may be completing missing.
> -Whenever a start is present but its corresponding stop is missing (as
> in the 6th obs. on top), it means that at the time of the study the
> person was still performing that activity, hence stop would be a
> variable called ageref. If start==ageref, then the interval would be
> approximated as 1 year.
>
> I would appreciate any feedback on how to best tackle this problem.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index