# Re: st: Problem with group operation and looping

 From n j cox To statalist@hsphsun2.harvard.edu Subject Re: st: Problem with group operation and looping Date Tue, 16 Oct 2007 20:51:47 +0100

```I see no (need for) looping here. This is, I believe,
a standard spell problem. Some years ago Richard Goldstein
and I wrote a -spell- program, which is still on SSC
(Stata 6 required). Then I wired in an assumption that
your data are -tsset- data and wrote -tsspell-, which is
also still on SSC (Stata 7 required). Please note that
the help file for -tsspell- is stuffed with spells for
spells.

More recently I wrote the ideas up in a column in
Stata Journal 7(2) 2007: an abstract is at
<http://www.stata-press.com/journals/sjabstracts/dm0029.pdf>
That is a rather detailed, and perhaps tedious, paper spelling
out the exact logic I have found useful for spell problems.

That column doesn't mention -spell- or -tsspell-. It
seemed more fruitful to explain the underlying Stata
ideas. So too, from now on, in this case.

the dataset is as desired. A spell for any person is, I take it,
that they stayed on the same drug with no gap more than 45 days.
Thus we need to get the time since previous prescription:

bysort id (date) : gen since = date - date[_n-1]

This will be missing for the first -date- for any -id-
and missing (.) counts as greater than 45 (days), which is fine. Thus
a spell starts whenever

(drug != drug[_n-1])    &     (since > 45)

That is a true or false statement which evaluates
to 1 if true or 0 if false. The parentheses and generous
spacing aren't needed, but they may be helpful. Thus we can
tag spells 1, 2, 3, ... for each -id- by

by id: gen spell = sum( (drug != drug[_n-1]) & (since > 45) )

Some people would want me to point out that the two lines
here could be one:

bysort id (date) : gen spell =
sum( (drug != drug[_n-1]) & ( (date - date[_n-1]) > 45))

but that variable -since- might turn out to be interesting or useful
in its own right.

Once you have defined spells, then the start of each spell is
naturally just the first -date- in each:

bysort id spell (date) : gen start = date[1]

and other calculations call for similar applications of -by:-.
The Speaking Stata column is full of such stuff, with endless
minor variations on the same theme.

The first question seems to call for various -keep- or -drop- statements.

Nick
n.j.cox@durham.ac.uk

Gao Liu

I think I am still quite confused with group operations using var[i]
and looping, so I am struggling with the following problem. I would
really appreciate if somebody can give me a hand.

I have a dataset containing following variables: ID, day_of_service,
drug_name, in which day_of service is the first day when the ID is
treated with the drug. The dataset contains data from 2003 to 2006.
Each ID may be treated for many times, and each time may be treated
with a different drug. In other word, they might switch from one drug
to another. But they would not switch back to a previously used drug.

I need to prepare two things for further analysis using the dataset.
First, I want to keep the observations with an entry. All
IDs with the first day_of service starting after 2004 will be kept.
For IDs with services before 2004, there are two situations:  (the
first service day in 2004)- (the last service day in 2003)>45 or <45.
If <45, all observations of the ID would be dropped. If ID with >45,
observations after 2004  will be kept, but 2003 observations for the
same ID will be dropped.

Second, I need to figure out that for each drug (6 drugs totally), how
long an ID stuck with the drug before he/her switched to another drug
or exit. It is considered an exit if there is no new treatment 45 days
after the last treatment.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```