Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: identifying a string of consecutive numbers


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: identifying a string of consecutive numbers
Date   Thu, 17 Apr 2008 19:37:23 +0100

Recall that Jon gave an example of data in wide form 

"For example, say I have 5 years of data and the following information
for 5 observations:

ObsA  1 1 1 1 0
ObsB  1 1 1 0 0
ObsC  1 1 0 0 1
ObsD  1 0 1 1 1
ObsE  0 0 0 0 1 

The variable names are something like yr1, yr2, yr3, yr4, yr5." 

And is interested in spells of 1s and 0s. We are thus looking for spells
rowwise. 

A better approach than my last suggestion would be based on collapsing
each row to its 
pattern, defined in the following way: 

1 1 1 1 1 is a pattern "1" 

0 0 0 0 0 is a pattern "0" 

A and B above are examples of pattern "10", etc. 

This would be done with a loop: 

gen pattern = "" 

qui forval i = 1/5 { 
	replace pattern = pattern + string(yr`i') if string(yr`i') !=
substr(pattern, -1, 1) 
} 

substr(pattern, -1, 1) is the last character of -pattern-. Notice that
all works well when 
the loop starts, as substr("", -1, 1) is empty and so string(yr1) will
certainly differ from 
that. 

With this reduction Jon can just select observations with a  pattern of
"10". 

Nick
[email protected] 

Nick Cox

gen lastpos = . 

quietly forval i = 1/5 { 
	replace lastpos = `i' if yr`i' == 1 
} 

You can also collapse each string to its positive occurrences. 

gen signature = "" 
forval i = 1/5 { 
	replace signature = signature + "`i'" if yr`i' == 1 
} 

The last positive is then the last character of that signature. 
However, this is awkward whenever there are all zeros. 

Jon Schwabish

Sergiy & Nick,

Thanks for your responses; they were both very
helpful. And yes, Nick is correct, Observation A does
not follow the criterion I laid out. Nonetheless....

I was wondering if you might have some thoughts on a
follow-up question. How would you go about identifying
the *last* non-zero observation in the series? In
other words, a series of numbers such as "111000", how
would you identify the third number in the sequence
(or even the fourth as the *first* zero observation).


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index