# RE: st: RE: identifying a string of consecutive numbers

 From "Nick Cox" To Subject RE: st: RE: identifying a string of consecutive numbers Date Thu, 17 Apr 2008 19:37:23 +0100

```Recall that Jon gave an example of data in wide form

"For example, say I have 5 years of data and the following information
for 5 observations:

ObsA  1 1 1 1 0
ObsB  1 1 1 0 0
ObsC  1 1 0 0 1
ObsD  1 0 1 1 1
ObsE  0 0 0 0 1

The variable names are something like yr1, yr2, yr3, yr4, yr5."

And is interested in spells of 1s and 0s. We are thus looking for spells
rowwise.

A better approach than my last suggestion would be based on collapsing
each row to its
pattern, defined in the following way:

1 1 1 1 1 is a pattern "1"

0 0 0 0 0 is a pattern "0"

A and B above are examples of pattern "10", etc.

This would be done with a loop:

gen pattern = ""

qui forval i = 1/5 {
replace pattern = pattern + string(yr`i') if string(yr`i') !=
substr(pattern, -1, 1)
}

substr(pattern, -1, 1) is the last character of -pattern-. Notice that
all works well when
the loop starts, as substr("", -1, 1) is empty and so string(yr1) will
certainly differ from
that.

With this reduction Jon can just select observations with a  pattern of
"10".

Nick
n.j.cox@durham.ac.uk

Nick Cox

gen lastpos = .

quietly forval i = 1/5 {
replace lastpos = `i' if yr`i' == 1
}

You can also collapse each string to its positive occurrences.

gen signature = ""
forval i = 1/5 {
replace signature = signature + "`i'" if yr`i' == 1
}

The last positive is then the last character of that signature.
However, this is awkward whenever there are all zeros.

Jon Schwabish

Sergiy & Nick,

Thanks for your responses; they were both very
helpful. And yes, Nick is correct, Observation A does
not follow the criterion I laid out. Nonetheless....

I was wondering if you might have some thoughts on a
follow-up question. How would you go about identifying
the *last* non-zero observation in the series? In
other words, a series of numbers such as "111000", how
would you identify the third number in the sequence
(or even the fourth as the *first* zero observation).

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```