This problem is easier than you think in that no use of looping
(-foreach- etc.) is needed. It is difficult in that there are
different possible reactions to missings on -v1-. This
post indicates one kind of solution.
You have panel data. You could -tsset- it without loss:
tsset childid day
That means that you could then use -tsspell- from SSC.
Alternatively, you can work from first principles.
I show the latter, but you might to look at -tsspell- too.
On one definition, each episode of diarrhea (in English,
diarrhoea) starts when v1 is 1 and the preceding value is not 1:
bysort childid (day): gen first = v1 == 1 & v1[_n-1] != 1
-first- is an indicator variable. You can use it to define
episodes:
by childid : gen episodes = sum(first)
_or_
by childid : gen episodes = cond(v1 == 0, 0, sum(first))
You can record the start dates of each episode:
by childid : gen start = day if first
by childid : replace start = start[_n-1] if !first
The time since the previous start is then
by childid : gen time_since = start - start[_n-1] if first
and you are then interested in counting how many episodes
are not within three days of the previous:
by childid : egen n_episodes = total(first * (time_since >= 3))
The first episode is always included on this definition.
Nick
n.j.cox@durham.ac.uk
Shuaib Kauchali
> I have data set of birth cohort data with longitudinal
> follow-up of these
> children till they were 9months old (270days), unless they
> were lost to
> follow up or died before then.
>
> the data structure looks like this:
> Childid (repeated group variable, daily visit to the clinic)
> day (day of visit)
> v1 (diarrhea on that day of visit)
> v2 <--this is the variable I would like to get(defined as diarrhea
> episodes: a string of 1's separated by at least 3 consecutive
> 0's is an
> episode)
>
>
> childid day v1 v2
> 1 1 . .
> 1 2 . .
> 1 3 . .
> 1 4 . .
> 2 1 0 1
> 2 2 1 1
> 2 3 1 1
> 2 4 0 1
> 3 1 1 2
> 3 2 1 2
> 3 3 0 2
> 3 4 0 2
> 3 5 0 2
> 3 6 1 2
> 3 7 1 2
> 4 1 1 1
> 4 2 . 1
> 4 3 1 1
> 4 4 0 1
> 4 5 . 1
> 4 6 0 1
> 4 7 0 1
> 5 1 0 1
> 5 2 1 1
> 5 3 0 1
> 5 4 1 1
> 6 1 0 0
> 6 2 0 0
> 6 3 0 0
> 6 4 0 0
> 6 5 0 0
> 6 6 0 0
> 6 7 0 0
>
>
> Note:
> 1. childid=4 is a bit tricky because of missing values; we
> assume the
> episode to be one as there were not more than 3 days
> separating 2 events.
> 2. childid=1 has not had any visits recorded, so he gets
> missing values
> for v2.
> 3. not everyone is followed-up for the same period: loss to
> follow-up,
> death, or completed the study (in my data set this should
> happen when the
> child reaches 270 days from birth. This is a birth cohort of
> 2500 children)
>
> My problem is I am unable to manipulate the data in Stata to
> get me the
> summary v2 of the number of episodes of diarrhea per child by
> total number
> of days observed. I am new to stata, but have been am a good
> learner (I
> have many of the stata press books to help). One way I came
> across in the
> books was to use explicit subscriptiing; this would allow me
> to count the
> total number of days followed per child; but I ma not sure
> how to create
> the alogorithm for the v2 creation--perhaps foreach,
> forvalue, or even
> while, local macro???. I find the commands a bit intimidating for a
> newcomer, but am willing to spend time learning it.
>
> Can anyone help?
> Best wishes
>
> Shuaib
>
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/