This problem is easier than you think in that no use of looping
(-foreach- etc.) is needed. It is difficult in that there are
different possible reactions to missings on -v1-. This
post indicates one kind of solution.
You have panel data. You could -tsset- it without loss:
tsset childid day
That means that you could then use -tsspell- from SSC.
Alternatively, you can work from first principles.
I show the latter, but you might to look at -tsspell- too.
On one definition, each episode of diarrhea (in English,
diarrhoea) starts when v1 is 1 and the preceding value is not 1:
bysort childid (day): gen first = v1 == 1 & v1[_n-1] != 1
-first- is an indicator variable. You can use it to define
episodes:
by childid : gen episodes = sum(first)
_or_
by childid : gen episodes = cond(v1 == 0, 0, sum(first))
You can record the start dates of each episode:
by childid : gen start = day if first
by childid : replace start = start[_n-1] if !first
The time since the previous start is then
by childid : gen time_since = start - start[_n-1] if first
and you are then interested in counting how many episodes
are not within three days of the previous:
by childid : egen n_episodes = total(first * (time_since >= 3))
The first episode is always included on this definition.
Shuaib Kauchali
I have data set of birth cohort data with longitudinal
follow-up of these
children till they were 9months old (270days), unless they
were lost to
follow up or died before then.
the data structure looks like this:
Childid (repeated group variable, daily visit to the clinic)
day (day of visit)
v1 (diarrhea on that day of visit)
v2 <--this is the variable I would like to get(defined as diarrhea
episodes: a string of 1's separated by at least 3 consecutive
0's is an
episode)
childid day v1 v2
1 1 . .
1 2 . .
1 3 . .
1 4 . .
2 1 0 1
2 2 1 1
2 3 1 1
2 4 0 1
3 1 1 2
3 2 1 2
3 3 0 2
3 4 0 2
3 5 0 2
3 6 1 2
3 7 1 2
4 1 1 1
4 2 . 1
4 3 1 1
4 4 0 1
4 5 . 1
4 6 0 1
4 7 0 1
5 1 0 1
5 2 1 1
5 3 0 1
5 4 1 1
6 1 0 0
6 2 0 0
6 3 0 0
6 4 0 0
6 5 0 0
6 6 0 0
6 7 0 0
Note:
1. childid=4 is a bit tricky because of missing values; we
assume the
episode to be one as there were not more than 3 days
separating 2 events.
2. childid=1 has not had any visits recorded, so he gets
missing values
for v2.
3. not everyone is followed-up for the same period: loss to
follow-up,
death, or completed the study (in my data set this should
happen when the
child reaches 270 days from birth. This is a birth cohort of
2500 children)
My problem is I am unable to manipulate the data in Stata to
get me the
summary v2 of the number of episodes of diarrhea per child by
total number
of days observed. I am new to stata, but have been am a good
learner (I
have many of the stata press books to help). One way I came
across in the
books was to use explicit subscriptiing; this would allow me
to count the
total number of days followed per child; but I ma not sure
how to create
the alogorithm for the v2 creation--perhaps foreach,
forvalue, or even
while, local macro???. I find the commands a bit intimidating for a
newcomer, but am willing to spend time learning it.
Can anyone help?
Best wishes
Shuaib
