Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Using a while loop to compare rows and delete them?


From   KLOSS <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: Using a while loop to compare rows and delete them?
Date   Wed, 20 Jun 2012 17:39:24 +0200

Dear Stata Listers,

Working with some spell data (1 row = 1 episode = 1 observation; 1 spell is subdivided into several "episodes") on employment histories I have the task to identify rows which refer to the same person and the same period (say May 5, 2002 to May 21, 2002). Let's call such observations to be "parallel". I then have to check the employment status given in these parallel observations and compare them to each other. Given some pre-defined rules, one or the other of the parallel observations should be dropped.

This has to be done for all rows in the data set and for all possible combination of parallel observations.

Using STATA/SE 12.0, I start with 20,014,607 rows. I then employ a while loop in order to check all these rules for all observations  (see the code below). I know: A while loop is not the fastest way to get results. However, I failed to get a forvalues loop doing the same. So, using said while loop the program has been running way too long. As I interrupted the procedure, exactly 19,999,999 rows remained in the data set.

So, these are my questions:
(1) Are the 19,999,999 rows I got just pure luck or are they a result of some limit of the while loop?
(2) Is there any fast lane procedure available for my issue?


My code is as follows:

--- CODE START ---

/*
Data structure: A running spell is subdivided into 2 episodes at the date another spell of the same person (identified via variable "id") begins or ends. The begin and end dates of the original spell are called "begorig" and "endorig" and are written in every episode of this spell. The begin and end dates of an episode are called "begepi" and "endepi". Hence, two episodes are parallel if they show the same id-value and the same begepi-value.
Within parallel episodes, observations are sorted as: employment (status==1) - training (status==4) - unemployed with benefit (status==5 & benefit==1) - unemployed without benefit (status==5 & benefit==0).
*/

sort id begepi status benefit

local i = 1 // counter
local N = _N // number of observations

while `i' <`N' {
        local j = `i'+1
        while `j' <=`N' {
                if begepi[`i']!=begepi[`j'] | id[`i']!=id[`j'] { /* consider only parallel episodes */
                        local i = `i'+1
                        continue, break
                }
                if status[`i']==1 & status[`j']==4 { /* SITUATION 1 */
                        drop in `i'
                        local N = `N'-1
                        continue, break
                }
                if status[`i']==5 & status[`j']==5 & /*
                */ benefit[`i']==1 & benefit[`j']==0 { /* SITUATION 2 */
                        drop in `j'
                        local N = `N'-1
                        continue
                }
                if status[`i']<=4 & status[`j']==5 & /*
                */ begorig[`i']>=begorig[`j'] & endorig[`i']<=endorig[`j'] & /*
                */ endorig[`i']-begorig[`i']<=14 { /* SITUATION 3 */
                        drop in `i'
                        local N = `N'-1
                        continue, break
                }
                if status[`i']<=4 & status[`j']==5 & /*
                */ begorig[`i']<begorig[`j'] & endorig[`i']<=endorig[`j'] & /*
                */ endorig[`i']-begorig[`j']<=14 { /* SITUATION 4 */
                        drop in `i'
                        local N = `N'-1
                        continue, break
                }
                if status[`i']<=4 & status[`j']==5 & /*
                */ begorig[`i']>begorig[`j'] & endorig[`i']>=endorig[`j'] & /*
                */ endorig[`j']-begorig[`i']<=30 { /* SITUATION 5 */
                        drop in `j'
                        local N = `N'-1
                        continue
                }
                local j = `j'+1
        }
}


--- CODE END ---

Thank You in advance for any hints and comments!

Kind Regards,
Michael Kloss


____________

Die ifo Niederlassung Dresden gehoert zum:

ifo Institut - Leibniz-Institut fuer Wirtschaftsforschung an der Universitaet Muenchen e.V.
Poschingerstr. 5, 81679 Muenchen, 
Sitz: Muenchen, Vereinsregister-Nr.: 4419, Amtsgericht Muenchen,
Vorstand: Prof. Dr. Dres. h.c. Hans-Werner Sinn (Praesident), Meinhard Knoche;
Steuernummer 143/217/10159, USt-IdNr. DE129516729


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index