Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Using a while loop to compare rows and delete them?


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Using a while loop to compare rows and delete them?
Date   Wed, 20 Jun 2012 18:55:17 +0100

I have not tried to understand your details, but my experience is that
neither -while- nor -forvalues- is needed for spell problems.

I'd just like to draw your attention to previous work

SJ-7-2  dm0029  . . . . . . . . . . . . . . Speaking Stata: Identifying spells
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q2/07   SJ 7(2):249--265                                 (no commands)
        shows how to handle spells with complete control over
        spell specification

-tsspell- from SSC:

tsspell from http://fmwww.bc.edu/RePEc/bocode/t
    'TSSPELL': module for identification of spells or runs in time series /
    tsspell examines the data, which must be tsset time series, to / identify
    spells or runs, which are contiguous sequences defined / by some
    condition. tsspell generates new variables indicating / distinct spells,

Nick

On Wed, Jun 20, 2012 at 4:39 PM, KLOSS <[email protected]> wrote:

> Working with some spell data (1 row = 1 episode = 1 observation; 1 spell is subdivided into several "episodes") on employment histories I have the task to identify rows which refer to the same person and the same period (say May 5, 2002 to May 21, 2002). Let's call such observations to be "parallel". I then have to check the employment status given in these parallel observations and compare them to each other. Given some pre-defined rules, one or the other of the parallel observations should be dropped.
>
> This has to be done for all rows in the data set and for all possible combination of parallel observations.
>
> Using STATA/SE 12.0, I start with 20,014,607 rows. I then employ a while loop in order to check all these rules for all observations  (see the code below). I know: A while loop is not the fastest way to get results. However, I failed to get a forvalues loop doing the same. So, using said while loop the program has been running way too long. As I interrupted the procedure, exactly 19,999,999 rows remained in the data set.
>
> So, these are my questions:
> (1) Are the 19,999,999 rows I got just pure luck or are they a result of some limit of the while loop?
> (2) Is there any fast lane procedure available for my issue?
>
>
> My code is as follows:
>
> --- CODE START ---
>
> /*
> Data structure: A running spell is subdivided into 2 episodes at the date another spell of the same person (identified via variable "id") begins or ends. The begin and end dates of the original spell are called "begorig" and "endorig" and are written in every episode of this spell. The begin and end dates of an episode are called "begepi" and "endepi". Hence, two episodes are parallel if they show the same id-value and the same begepi-value.
> Within parallel episodes, observations are sorted as: employment (status==1) - training (status==4) - unemployed with benefit (status==5 & benefit==1) - unemployed without benefit (status==5 & benefit==0).
> */
>
> sort id begepi status benefit
>
> local i = 1 // counter
> local N = _N // number of observations
>
> while `i' <`N' {
>        local j = `i'+1
>        while `j' <=`N' {
>                if begepi[`i']!=begepi[`j'] | id[`i']!=id[`j'] { /* consider only parallel episodes */
>                        local i = `i'+1
>                        continue, break
>                }
>                if status[`i']==1 & status[`j']==4 { /* SITUATION 1 */
>                        drop in `i'
>                        local N = `N'-1
>                        continue, break
>                }
>                if status[`i']==5 & status[`j']==5 & /*
>                */ benefit[`i']==1 & benefit[`j']==0 { /* SITUATION 2 */
>                        drop in `j'
>                        local N = `N'-1
>                        continue
>                }
>                if status[`i']<=4 & status[`j']==5 & /*
>                */ begorig[`i']>=begorig[`j'] & endorig[`i']<=endorig[`j'] & /*
>                */ endorig[`i']-begorig[`i']<=14 { /* SITUATION 3 */
>                        drop in `i'
>                        local N = `N'-1
>                        continue, break
>                }
>                if status[`i']<=4 & status[`j']==5 & /*
>                */ begorig[`i']<begorig[`j'] & endorig[`i']<=endorig[`j'] & /*
>                */ endorig[`i']-begorig[`j']<=14 { /* SITUATION 4 */
>                        drop in `i'
>                        local N = `N'-1
>                        continue, break
>                }
>                if status[`i']<=4 & status[`j']==5 & /*
>                */ begorig[`i']>begorig[`j'] & endorig[`i']>=endorig[`j'] & /*
>                */ endorig[`j']-begorig[`i']<=30 { /* SITUATION 5 */
>                        drop in `j'
>                        local N = `N'-1
>                        continue
>                }
>                local j = `j'+1
>        }
> }
>
>
> --- CODE END ---
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index