RE: st: AW: RE: Problem looping over spells for an individual

 From "Nick Cox"
Subject RE: st: AW: RE: Problem looping over spells for an individual
Date Sun, 1 Mar 2009 18:08:00 -0000

```Of my suggestions,

3. -panelthin- and 0. -tsspell- do assume -tsset- data; that's implied by their purpose and in each case documented in their help files. But I didn't suggest that either would necessarily solve your problem, just that they might give you some ideas.

1. and 2. don't presuppose -tsset- data.

In your code, you combine two quite different and contradictory strategies, (1) writing a -byable- program and (2) building in the identifier and time structure of your data. You are also writing a -recall- program when I suspect that -onecall- is closer to your problem. Regardless of that detail I'd go for (2).

-pin- appears here and is not explained. I guess that is equivalent to the -id- of earlier postings. A more general point is that others have little hope of understanding clearly anything that you do not explain. In particular, other variables -anmal- and -mal0- appear here which do not seem to have been mentioned in your earlier postings.

Within no variables specified and no scope for -if- and -in- conditions, your variable -touse- will always be 1. Your code can I think be simplified without loss to

program ilona, sortpreserve
quietly {
tempvar T t
bysort pin (start): gen `t' = _n
by pin : gen `T' = _N
sum `T', meanonly
local tmax = r(max)
drop `T'
replace lagend = (end + 19 + 1) if (anmal > 0 & anmal < .)
forvalues i = 1(1)`tmax' {
drop if end < lagend[`i'-1] & lagend[`i'-1] < . & `t'==`i' & `i'!=1
replace lagend = (end + 21 + 1) if (mal0 > 0 & mal0 < .) &  lagend==. & `i'<`tmax'
}
}
end

-- but I have no idea whether this is progress or not.

Nick
n.j.cox@durham.ac.uk

Ilona Carneiro

Sent: 27 February 2009 18:09
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: AW: RE: Problem looping over spells for an individual

Thanks for these suggestions, Nick. Some of them seem to require
"tsset"ing the data. However, I am trying to adapt the code from
panelthin.ado, but can't quite get it to work for me.  I've been
battling with this for a couple of days now and I can't seem to find a
way to loop over consecutive observations for an individual. I see
that in panelthin, it automatically works separately on each panel
because it is tsset, but how can I do this for survival time "stset"
data?

I've written a little sub-programme to see if I can get this to work.
Generation of a local for _N now works and does give the correct count
of observations per individual. However, my generation of the tempvar
`t' to define the sequential observations (_n) for each individual
doesn't work. Any suggestions?

capture program drop temp
program define temp, byable(recall, noheader) sortpreserve

qui{
marksample touse
count if `touse'
if r(N) == 0 error 2000

tempvar T t
sort `_byvars' start
by `_byvars': gen `t' = _n * `touse'
sort `_byvars' start
by `_byvars': gen `T' = _N * `touse'
sort `_byvars' start
sum `T', meanonly
local tmax = r(max)
drop `T'

replace lagend = (end + 19 + 1) if (anmal > 0 & anmal < .)
sort pin start
forvalues i = 1(1)`tmax' {
drop if end < lagend[`i'-1] & lagend[`i'-1] < . & `t'==`i' & `i'!=1
replace lagend = (end + 21 + 1) if (mal0 > 0 & mal0 < .) &
lagend==. & `i'<`tmax'
}
}
end

bysort id: temp

regards

Ilona

On 26 Feb 2009, at 13:20, Nick Cox wrote:

> Thanks for this, which is good news for me because it explains why
> the code I was seeing looked as it did.
>
> In terms of moving forward, I have a few vague suggestions.
>
> 0. Spells. See the suggestions on reading and software in the thread
> started by Jakob Petersen yesterday.
>
> <http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.0902/date/article-1122.html
> >
>
> 1. One is more of style or taste than technique. I prefer to think
> in terms of tagging observations I want to keep or work on with 1
> and those I don't with 0. Then you can do almost anything later
>
> 	... if tag
>
> or
>
> 	... if !tag
>
> as the case may be.
>
> An advantage of that style: it is reversible, both within an
> algorithm and generally.
> (If you really want to -drop- observations, -drop- them in one go
> when the selection is final.)
>
> 2. One strategy might be
>
> loop over individuals {
> 	-expand- each individual to a block of observations with one
> observation per day
> 	<magic bit>
> 	reduce each individual back again
> }
>
> 3. This problem reminds me loosely of one tackled with -panelthin-
> on SSC. The code for that may suggest some technique.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Ilona Carneiro
>
> Many thanks to Nick & Martin for pointing out my error using "if" -
> you are correct and that's why it wasn't working. However, I'm still
> unable to do what I wanted to. Apologies for posting code which I
> tried to simplify, but just made incomprehensible! The snippet was
> part of a much larger programme in which the other local macros are
> all defined.
>
> I'll try to clarify. Here is an example of the problem I have. These
> are consecutive periods of observations for an individual - the end
> denoted by a clinic visit which may or may not be defined as a case
> (depending on diagnostic result), or by exit from the study.
>
> id		start		end		case	tx
> 1		10		20		1		1
> 1		20		35		1		0
> 1		35		50		1		0
> 1		50		100		.		.
>
> I need to exclude 19 days at risk if the patient received treatment
> (tx==1) as this is considered to be prophylaxis, and to avoid counting
> the same episode (case==1) twice I  also exclude 19 days at risk after
> a case is diagnosed. However, as the latter is only to prevent double-
> counting it is not necessary if the case has already been
> disqualified.
>
> What I need to get is the following:
>
> id		start		end		case	tx
> 1		10		20		1		1
> 1		40		50		1		0
> 1		50		100		.		.
>
> I originally coded the following VERY crudely:
>
> /* To calculate the gaps  */
> sort id start
> by id: gen lagend = end + lag if (tx > 0 & tx < .) | (case > 0 & case
> < .) & _n!=_N
>
>
> /* To drop periods of time that are disqualified - repeated 3 times as
> there may be up to 3 consecutively - to be generalisable, it could be
> more */
> sort id start
> by id: drop if lagend[_n-1] > end & lagend[_n-1] < . & _n!=1
> sort id start
> by id: drop if lagend[_n-1] > end & lagend[_n-1] < . & _n!=1
> sort id start
> by id: drop if lagend[_n-1] > end & lagend[_n-1] < . & _n!=1
> sort id start
> by id: drop if lagstart > end & lagstart < . & _n!=1
>
> /* To update the start date */
> sort id start
> by id: replace start = lagend[_n-1] if lagend[_n-1] < . & _n!=1
> sort id start
> by id: drop if (end < start | start[_n-1] > end) & end < . & start < .
> & _n!=1
>
> This works fine for adding a gap after each treatment, as I need to do
> this even if the observation period is dropped from the time at risk.
> The code gave the following result, as both the 2nd & 3rd episodes
> were disqualified, instead of just the 2nd:
>
> id		start		end		case	tx
> 1		10		20		1		1
> 1		55		100		.		.
>
> I realise that I need to evaluate the generation of the gap after
> cases separately for each observation period, incase the observation
> is dropped. But can't seem to find a way to do this. I hope this is a
> clearer explanation of the problem.
>
> On another point, I subsequently use stgen gap =  gaplen() to
> calculate how much time to exclude from the time at risk. Stata
> appears to count one more than just the actual gap, i.e. it will give
> me a gap of 20 days between an observation ending with day 20, and a
> subsequent observation starting at day 40, when the actual time
> excluded in-between is 19 days. I'm just subtracting 1 from the
> calculation at present, but is there a reason for this?
>
> Ilona
>
>
>
On 25 Feb 2009, at 18:27, Martin Weiss wrote:
>
>>
>> <>
>>
>>
>> I was desperate to find an SJ tip for Ilona on the difference
>> between "if"
>> and "if"; turns out it is an FAQ:
>> http://www.stata.com/support/faqs/lang/ifqualifier.html
>>
>>
>>
>>
>> HTH
>> Martin
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Nick Cox
>> Gesendet: Mittwoch, 25. Februar 2009 18:22
>> An: statalist@hsphsun2.harvard.edu
>> Betreff: st: RE: Problem looping over spells for an individual
>>
>> Unless you are working under the aegis of -by:- _N will always be
>> interpreted as the total number of observations. This code doesn't
>> satisfy that.
>>
>> I echo Martin Weiss in suspecting that your -if `touse'- is a bug.
>> You
>> are almost certainly confusing the two flavours of -if-.
>>
>> Otherwise, your code still looks very confused and based on a
>> variety of
>> misunderstandings. Apart from `touse', which is defined by -
>> marksample-,
>> all of the local macros you refer to will be treated as empty
>> strings,
>> as none has been defined earlier in the program. I am surprised to
>> hear
>> that it is running at all.
>>
>> It does not look as if you need a program anyway. My impression is
>> that
>> all you need is to use -by:- but I don't understand your problem well
>> enough to suggest better code. Someone else may be able to give
>> better
>> help. If not, rather than a lengthy word description, you should
>> perhaps
>> give an example of your data with the intended result.
>>
>> Nick
>> n.j.cox@durham.ac.uk
>>
>> Ilona Carneiro
>>
>> I am trying to write a programme that will run a command sequentially
>> for observations of an individual. For each individual I have
>> multiple
>> spells and multiple failures. However, the twist is that I also need
>> to exclude a period of time at risk after each treatment
>> (prophylaxis)
>> and after each failure (to prevent double-counting of failures that
>> may actually be the same episode). I managed to do this without any
>> problem for the treatment, but if an episode is disqualified (by a
>> prior treatment or episode) I don't want it to disqualify a
>> subsequent
>> episode. Therefore I need to run the code sequentially for each spell
>> of an individual, but using the marksample touse code to run it "by"
>> individual doesn't seem to be working - the "forvalues" seems to
>> always interpret _N as the last observation in the whole dataset, not
>> the last observation for each individual.
>>
>> I have the following code:
>>
>> 		program define byid, byable(recall, noheader)
>> 		marksample touse
>> 		sort `id' `start'
>> 		if `touse' {
>> 		forvalues i = 1(1)`=_N' {
>> 		replace lagend = (`end' + `lag') if ((`tx' > 0 & `tx' <
>> .) | (`case'
>>> 0 & `case' < .))
>> 		drop if lagend[`i'-1]>`end' & `id'[`i'-1]==`id'
>> 		}
>> 		}
>> 		end
>>
>> 		gen lagend=.
>> 		qui by id: byid
>>
>> but I get the error:
>> r(111);
>>
>> And the programme isn't doing what I need it to.
>>
>>

