Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: st: AW: RE: Problem looping over spells for an individual


From   Ilona Carneiro <ilonac@orange.es>
To   statalist@hsphsun2.harvard.edu
Subject   Re: AW: st: AW: RE: Problem looping over spells for an individual
Date   Sun, 1 Mar 2009 16:07:16 +0100

Thanks Martin

Here is a sample of the data that requires this fix:

pin		start		end		case	tx
1		10		20		1		1
1		20		35		1		0
1		35		50		1		0
1		50		100		.		

and here is an extract of the trace from the programme which is 'byable' and I'm running "by pin:" which is the personal identification number. This shows that the programme should be generating a variable that is equal to the observation number _n within pin sorted by the observation start date. It calculates the _N correctly for each pin, but as you can see from the "noi di "t=" `t', the tempvar `t' stays constant at 1 when `i' =2 showing that it has moved to the subsequent observation of the dataset. I would prefer not to make the programme byable, as it actually needs to be nested within a bigger programme, but I need a way to run the code sequentially.

 - tempvar T t
  - sort `_byvars' start
  = sort pin start
  - by `_byvars': gen `t' = _n
  = by pin: gen __000004 = _n
 - sort `_byvars' start
  = sort pin start
  - by `_byvars': gen `T' = _N * `touse'
  = by pin: gen __000003 = _N * __000002
  - sort `_byvars' start
  = sort pin start
  - sum `T', meanonly
  = sum __000003, meanonly
  - local tmax = r(max)
  - drop `T'
  = drop __000003
  - replace lagend = (end + 19 + 1) if (anmal > 0 & anmal < .)
  - sort `_byvars' start
  = sort pin start
  - forvalues i = 1(1)`tmax' {
  = forvalues i = 1(1)2 {
  - noi di "T=" `tmax'
  = noi di "T=" 2
T=2
  - noi di "t=" `t'
  = noi di "t=" __000004
t=1
  - noi di "i=" `i'
  = noi di "i=" 1
i=1
  - drop if end < lagend[`i'-1] & lagend[`i'-1] < . & `t'==`i' & `i'!=1
  = drop if end < lagend[1-1] & lagend[1-1] < . & __000004==1 & 1!=1
- replace lagend = (end + 19 + 1) if (mal0 > 0 & mal0 < .) & lagend==. & `t'==`i' & `i'<`tmax' = replace lagend = (end + 19 + 1) if (mal0 > 0 & mal0 < .) & lagend==. & __000004==1 & 1<2
  - }
  - noi di "T=" `tmax'
  = noi di "T=" 2
T=2
  - noi di "t=" `t'
  = noi di "t=" __000004
t=1
  - noi di "i=" `i'
  = noi di "i=" 2
i=2
  - drop if end < lagend[`i'-1] & lagend[`i'-1] < . & `t'==`i' & `i'!=1
  = drop if end < lagend[2-1] & lagend[2-1] < . & __000004==2 & 2!=1
- replace lagend = (end + 21 + 1) if (mal0 > 0 & mal0 < .) & lagend==. & `t'==`i' & `i'<`tmax' = replace lagend = (end + 21 + 1) if (mal0 > 0 & mal0 < .) & lagend==. & __000004==2 & 2<2
  - }
  - }

Ilona


On 27 Feb 2009, at 22:04, Martin Weiss wrote:


<>

It is difficult for an outsider to make up data for your specific problem; therefore either frame the prob in terms of a dataset shipped with Stata or
-set trace on- and report the area around the error...




HTH
Martin


-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Ilona Carneiro
Gesendet: Freitag, 27. Februar 2009 19:09
An: statalist@hsphsun2.harvard.edu
Betreff: Re: st: AW: RE: Problem looping over spells for an individual

Thanks for these suggestions, Nick. Some of them seem to require
"tsset"ing the data. However, I am trying to adapt the code from
panelthin.ado, but can't quite get it to work for me.  I've been
battling with this for a couple of days now and I can't seem to find a
way to loop over consecutive observations for an individual. I see
that in panelthin, it automatically works separately on each panel
because it is tsset, but how can I do this for survival time "stset"
data?

I've written a little sub-programme to see if I can get this to work.
Generation of a local for _N now works and does give the correct count
of observations per individual. However, my generation of the tempvar
`t' to define the sequential observations (_n) for each individual
doesn't work. Any suggestions?

		capture program drop temp		
		program define temp, byable(recall, noheader) sortpreserve
		
		qui{
		marksample touse
		count if `touse'
		if r(N) == 0 error 2000

		tempvar T t
		sort `_byvars' start
		by `_byvars': gen `t' = _n * `touse'
		sort `_byvars' start
		by `_byvars': gen `T' = _N * `touse'
		sort `_byvars' start
		sum `T', meanonly
		local tmax = r(max)
		drop `T'

		replace lagend = (end + 19 + 1) if (anmal > 0 & anmal < .)
		sort pin start
		forvalues i = 1(1)`tmax' {
			drop if end < lagend[`i'-1] & lagend[`i'-1] < . &
`t'==`i' & `i'!=1
			replace lagend = (end + 21 + 1) if (mal0 > 0 & mal0
< .) &
lagend==. & `i'<`tmax'
			}
		}
		end
			
		bysort id: temp


regards

Ilona

On 26 Feb 2009, at 13:20, Nick Cox wrote:

Thanks for this, which is good news for me because it explains why
the code I was seeing looked as it did.

In terms of moving forward, I have a few vague suggestions.

0. Spells. See the suggestions on reading and software in the thread
started by Jakob Petersen yesterday.


<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.090
2/date/article-1122.html


1. One is more of style or taste than technique. I prefer to think
in terms of tagging observations I want to keep or work on with 1
and those I don't with 0. Then you can do almost anything later

	... if tag

or

	... if !tag

as the case may be.

An advantage of that style: it is reversible, both within an
algorithm and generally.
(If you really want to -drop- observations, -drop- them in one go
when the selection is final.)

2. One strategy might be

loop over individuals {
	-expand- each individual to a block of observations with one
observation per day
	<magic bit>
	reduce each individual back again
}

3. This problem reminds me loosely of one tackled with -panelthin-
on SSC. The code for that may suggest some technique.

Nick
n.j.cox@durham.ac.uk

Ilona Carneiro

Many thanks to Nick & Martin for pointing out my error using "if" -
you are correct and that's why it wasn't working. However, I'm still
unable to do what I wanted to. Apologies for posting code which I
tried to simplify, but just made incomprehensible! The snippet was
part of a much larger programme in which the other local macros are
all defined.

I'll try to clarify. Here is an example of the problem I have. These
are consecutive periods of observations for an individual - the end
denoted by a clinic visit which may or may not be defined as a case
(depending on diagnostic result), or by exit from the study.

id		start		end		case	tx
1		10		20		1		1
1		20		35		1		0
1		35		50		1		0
1		50		100		.		.

I need to exclude 19 days at risk if the patient received treatment
(tx==1) as this is considered to be prophylaxis, and to avoid counting the same episode (case==1) twice I also exclude 19 days at risk after a case is diagnosed. However, as the latter is only to prevent double-
counting it is not necessary if the case has already been
disqualified.

What I need to get is the following:

id		start		end		case	tx
1		10		20		1		1
1		40		50		1		0
1		50		100		.		.

I originally coded the following VERY crudely:

/* To calculate the gaps  */
sort id start
by id: gen lagend = end + lag if (tx > 0 & tx < .) | (case > 0 & case
< .) & _n!=_N


/* To drop periods of time that are disqualified - repeated 3 times as
there may be up to 3 consecutively - to be generalisable, it could be
more */
sort id start
by id: drop if lagend[_n-1] > end & lagend[_n-1] < . & _n!=1
sort id start
by id: drop if lagend[_n-1] > end & lagend[_n-1] < . & _n!=1
sort id start
by id: drop if lagend[_n-1] > end & lagend[_n-1] < . & _n!=1
sort id start
by id: drop if lagstart > end & lagstart < . & _n!=1

/* To update the start date */
sort id start
by id: replace start = lagend[_n-1] if lagend[_n-1] < . & _n!=1	
sort id start
by id: drop if (end < start | start[_n-1] > end) & end < . & start < .
& _n!=1

This works fine for adding a gap after each treatment, as I need to do
this even if the observation period is dropped from the time at risk.
The code gave the following result, as both the 2nd & 3rd episodes
were disqualified, instead of just the 2nd:

id		start		end		case	tx
1		10		20		1		1
1		55		100		.		.

I realise that I need to evaluate the generation of the gap after
cases separately for each observation period, incase the observation
is dropped. But can't seem to find a way to do this. I hope this is a
clearer explanation of the problem.

On another point, I subsequently use stgen gap =  gaplen() to
calculate how much time to exclude from the time at risk. Stata
appears to count one more than just the actual gap, i.e. it will give
me a gap of 20 days between an observation ending with day 20, and a
subsequent observation starting at day 40, when the actual time
excluded in-between is 19 days. I'm just subtracting 1 from the
calculation at present, but is there a reason for this?

Ilona



On 25 Feb 2009, at 18:27, Martin Weiss wrote:


<>


I was desperate to find an SJ tip for Ilona on the difference
between "if"
and "if"; turns out it is an FAQ:
http://www.stata.com/support/faqs/lang/ifqualifier.html




HTH
Martin


-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Nick Cox
Gesendet: Mittwoch, 25. Februar 2009 18:22
An: statalist@hsphsun2.harvard.edu
Betreff: st: RE: Problem looping over spells for an individual

Unless you are working under the aegis of -by:- _N will always be
interpreted as the total number of observations. This code doesn't
satisfy that.

I echo Martin Weiss in suspecting that your -if `touse'- is a bug.
You
are almost certainly confusing the two flavours of -if-.

Otherwise, your code still looks very confused and based on a
variety of
misunderstandings. Apart from `touse', which is defined by -
marksample-,
all of the local macros you refer to will be treated as empty
strings,
as none has been defined earlier in the program. I am surprised to
hear
that it is running at all.

It does not look as if you need a program anyway. My impression is
that
all you need is to use -by:- but I don't understand your problem well
enough to suggest better code. Someone else may be able to give
better
help. If not, rather than a lengthy word description, you should
perhaps
give an example of your data with the intended result.

Nick
n.j.cox@durham.ac.uk

Ilona Carneiro

I am trying to write a programme that will run a command sequentially
for observations of an individual. For each individual I have
multiple
spells and multiple failures. However, the twist is that I also need
to exclude a period of time at risk after each treatment
(prophylaxis)
and after each failure (to prevent double-counting of failures that
may actually be the same episode). I managed to do this without any
problem for the treatment, but if an episode is disqualified (by a
prior treatment or episode) I don't want it to disqualify a
subsequent
episode. Therefore I need to run the code sequentially for each spell
of an individual, but using the marksample touse code to run it "by"
individual doesn't seem to be working - the "forvalues" seems to
always interpret _N as the last observation in the whole dataset, not
the last observation for each individual.

I have the following code:

		program define byid, byable(recall, noheader)
		marksample touse
		sort `id' `start'
		if `touse' {
		forvalues i = 1(1)`=_N' {
		replace lagend = (`end' + `lag') if ((`tx' > 0 & `tx' <
.) | (`case'
0 & `case' < .))
		drop if lagend[`i'-1]>`end' & `id'[`i'-1]==`id'
		}
		}
		end
		
		gen lagend=. 	
		qui by id: byid

but I get the error:
2nd by group not found
r(111);

And the programme isn't doing what I need it to.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index