Thanks for this, which is good news for me because it explains why the code I was seeing looked as it did. In terms of moving forward, I have a few vague suggestions. 0. Spells. See the suggestions on reading and software in the thread started by Jakob Petersen yesterday. <http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.0902/date/article-1122.html> 1. One is more of style or taste than technique. I prefer to think in terms of tagging observations I want to keep or work on with 1 and those I don't with 0. Then you can do almost anything later ... if tag or ... if !tag as the case may be. An advantage of that style: it is reversible, both within an algorithm and generally. (If you really want to -drop- observations, -drop- them in one go when the selection is final.) 2. One strategy might be loop over individuals { -expand- each individual to a block of observations with one observation per day <magic bit> reduce each individual back again } 3. This problem reminds me loosely of one tackled with -panelthin- on SSC. The code for that may suggest some technique. Nick n.j.cox@durham.ac.uk Ilona Carneiro Many thanks to Nick & Martin for pointing out my error using "if" - you are correct and that's why it wasn't working. However, I'm still unable to do what I wanted to. Apologies for posting code which I tried to simplify, but just made incomprehensible! The snippet was part of a much larger programme in which the other local macros are all defined. I'll try to clarify. Here is an example of the problem I have. These are consecutive periods of observations for an individual - the end denoted by a clinic visit which may or may not be defined as a case (depending on diagnostic result), or by exit from the study. id start end case tx 1 10 20 1 1 1 20 35 1 0 1 35 50 1 0 1 50 100 . . I need to exclude 19 days at risk if the patient received treatment (tx==1) as this is considered to be prophylaxis, and to avoid counting the same episode (case==1) twice I also exclude 19 days at risk after a case is diagnosed. However, as the latter is only to prevent double- counting it is not necessary if the case has already been disqualified. What I need to get is the following: id start end case tx 1 10 20 1 1 1 40 50 1 0 1 50 100 . . I originally coded the following VERY crudely: /* To calculate the gaps */ sort id start by id: gen lagend = end + lag if (tx > 0 & tx < .) | (case > 0 & case < .) & _n!=_N /* To drop periods of time that are disqualified - repeated 3 times as there may be up to 3 consecutively - to be generalisable, it could be more */ sort id start by id: drop if lagend[_n-1] > end & lagend[_n-1] < . & _n!=1 sort id start by id: drop if lagend[_n-1] > end & lagend[_n-1] < . & _n!=1 sort id start by id: drop if lagend[_n-1] > end & lagend[_n-1] < . & _n!=1 sort id start by id: drop if lagstart > end & lagstart < . & _n!=1 /* To update the start date */ sort id start by id: replace start = lagend[_n-1] if lagend[_n-1] < . & _n!=1 sort id start by id: drop if (end < start | start[_n-1] > end) & end < . & start < . & _n!=1 This works fine for adding a gap after each treatment, as I need to do this even if the observation period is dropped from the time at risk. The code gave the following result, as both the 2nd & 3rd episodes were disqualified, instead of just the 2nd: id start end case tx 1 10 20 1 1 1 55 100 . . I realise that I need to evaluate the generation of the gap after cases separately for each observation period, incase the observation is dropped. But can't seem to find a way to do this. I hope this is a clearer explanation of the problem. On another point, I subsequently use stgen gap = gaplen() to calculate how much time to exclude from the time at risk. Stata appears to count one more than just the actual gap, i.e. it will give me a gap of 20 days between an observation ending with day 20, and a subsequent observation starting at day 40, when the actual time excluded in-between is 19 days. I'm just subtracting 1 from the calculation at present, but is there a reason for this? Ilona On 25 Feb 2009, at 18:27, Martin Weiss wrote: > > <> > > > I was desperate to find an SJ tip for Ilona on the difference > between "if" > and "if"; turns out it is an FAQ: > http://www.stata.com/support/faqs/lang/ifqualifier.html > > > > > HTH > Martin > > > -----Ursprüngliche Nachricht----- > Von: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Nick Cox > Gesendet: Mittwoch, 25. Februar 2009 18:22 > An: statalist@hsphsun2.harvard.edu > Betreff: st: RE: Problem looping over spells for an individual > > Unless you are working under the aegis of -by:- _N will always be > interpreted as the total number of observations. This code doesn't > satisfy that. > > I echo Martin Weiss in suspecting that your -if `touse'- is a bug. You > are almost certainly confusing the two flavours of -if-. > > Otherwise, your code still looks very confused and based on a > variety of > misunderstandings. Apart from `touse', which is defined by - > marksample-, > all of the local macros you refer to will be treated as empty strings, > as none has been defined earlier in the program. I am surprised to > hear > that it is running at all. > > It does not look as if you need a program anyway. My impression is > that > all you need is to use -by:- but I don't understand your problem well > enough to suggest better code. Someone else may be able to give better > help. If not, rather than a lengthy word description, you should > perhaps > give an example of your data with the intended result. > > Nick > n.j.cox@durham.ac.uk > > Ilona Carneiro > > I am trying to write a programme that will run a command sequentially > for observations of an individual. For each individual I have multiple > spells and multiple failures. However, the twist is that I also need > to exclude a period of time at risk after each treatment (prophylaxis) > and after each failure (to prevent double-counting of failures that > may actually be the same episode). I managed to do this without any > problem for the treatment, but if an episode is disqualified (by a > prior treatment or episode) I don't want it to disqualify a subsequent > episode. Therefore I need to run the code sequentially for each spell > of an individual, but using the marksample touse code to run it "by" > individual doesn't seem to be working - the "forvalues" seems to > always interpret _N as the last observation in the whole dataset, not > the last observation for each individual. > > I have the following code: > > program define byid, byable(recall, noheader) > marksample touse > sort `id' `start' > if `touse' { > forvalues i = 1(1)`=_N' { > replace lagend = (`end' + `lag') if ((`tx' > 0 & `tx' < > .) | (`case' >> 0 & `case' < .)) > drop if lagend[`i'-1]>`end' & `id'[`i'-1]==`id' > } > } > end > > gen lagend=. > qui by id: byid > > but I get the error: > 2nd by group not found > r(111); > > And the programme isn't doing what I need it to. > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

