# Re: st: RE: Compute a summary variable based on a predefined algorithm

 From "Shuaib Kauchali" To statalist@hsphsun2.harvard.edu Subject Re: st: RE: Compute a summary variable based on a predefined algorithm Date Tue, 20 Mar 2007 13:08:43 +0200

Hi there,

I tried this; to generate first (I am calling first3):

bysort childid (day): gen first3=diar==1 & diar[_n-1]!=1 & diar[_n-2]!=1 & diar[_n-3]!=1

Here I got the desired results: i.e, correct definition of an episode of diar. Please see the revised screenshot showing the new listing of first3 and epis3.

Now the next challenge is to define peristent diarrhoea (a string of >= 14 consecutive diar days).

e.g. diar 00011111111111111000110011101111111111000000 would be 2 episodes of persistent diarrhoea (note the second episode has some diarrhoea free days, but do not amount to >=3 days,so it is still the same episode). This will require marking of the begining of an episode (in my case I have done this with first3) and last day of the episode (I am not sure how to derive this). Once this is done, then we can compute the duration between first3 and last day (lastday).

Again, I have not thought about missing value for diar in any of these definitions. For now I am assuming they are diarrhoea free days.

Can anyone help me get to the next stage,

Shuaib

On Sun, 18 Mar 2007 19:25:03 +0200, Nick Cox <n.j.cox@durham.ac.uk> wrote:

```This problem is easier than you think in that no use of looping
(-foreach- etc.) is needed. It is difficult in that there are
different possible reactions to missings on -v1-. This
post indicates one kind of solution.

You have panel data. You could -tsset- it without loss:

tsset childid day

That means that you could then use -tsspell- from SSC.
Alternatively, you can work from first principles.
I show the latter, but you might to look at -tsspell- too.

On one definition, each episode of diarrhea (in English,
diarrhoea) starts when v1 is 1 and the preceding value is not 1:

bysort childid (day): gen first = v1 == 1 & v1[_n-1] != 1

-first- is an indicator variable. You can use it to define
episodes:

by childid : gen episodes = sum(first)

_or_

by childid : gen episodes = cond(v1 == 0, 0, sum(first))

You can record the start dates of each episode:

by childid : gen start = day if first
by childid : replace start = start[_n-1] if !first

The time since the previous start is then

by childid : gen time_since = start - start[_n-1] if first

and you are then interested in counting how many episodes
are not within three days of the previous:

by childid : egen n_episodes = total(first * (time_since >= 3))

The first episode is always included on this definition.

Nick
n.j.cox@durham.ac.uk

Shuaib Kauchali

```
```I have data set of birth cohort data with longitudinal
follow-up of these
children till they were 9months old (270days), unless they
were lost to
follow up or died before then.

the data structure looks like this:
Childid (repeated group variable, daily visit to the clinic)
day (day of visit)
v1 (diarrhea on that day of visit)
v2 <--this is the variable I would like to get(defined as diarrhea
episodes: a string of 1's separated by at least 3 consecutive
0's is an
episode)

childid day v1  v2
1   1   .   .
1   2   .   .
1   3   .   .
1   4   .   .
2   1   0   1
2   2   1   1
2   3   1   1
2   4   0   1
3   1   1   2
3   2   1   2
3   3   0   2
3   4   0   2
3   5   0   2
3   6   1   2
3   7   1   2
4   1   1   1
4   2   .   1
4   3   1   1
4   4   0   1
4   5   .   1
4   6   0   1
4   7   0   1
5   1   0   1
5   2   1   1
5   3   0   1
5   4   1   1
6   1   0   0
6   2   0   0
6   3   0   0
6   4   0   0
6   5   0   0
6   6   0   0
6   7   0   0

Note:
1. childid=4 is a bit tricky because of missing values; we
assume the
episode to be one as there were not more than 3 days
separating 2 events.
2. childid=1 has not had any visits recorded, so he gets
missing values
for v2.
3. not everyone is followed-up for the same period: loss to
follow-up,
death, or completed the study (in my data set this should
happen when the
child reaches 270 days from birth. This is a birth cohort of
2500 children)

My problem is I am unable to manipulate the data in Stata to
get me the
summary v2 of the number of episodes of diarrhea per child by
total number
of days observed. I am new to stata, but have been am a good
learner (I
have many of the stata press books to help). One way I came
across in the
books was to use explicit subscriptiing; this would allow me
to count the
total number of days followed per child; but I ma not sure
how to create
the alogorithm for the v2 creation--perhaps foreach,
forvalue, or even
while, local macro???. I find the commands a bit intimidating for a
newcomer, but am willing to spend time learning it.

Can anyone help?
Best wishes

Shuaib

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

```
```*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```

--
Shuaib Kauchali
Dept of Paediatrics & Child Health
Child Health Epidemiology
University of KwaZulu-Natal, Durban
South Africa```

Attachment: 2007-03-20_125405.png
Description: PNG image