[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: RE: identifying a string of consecutive numbers |

Date |
Thu, 17 Apr 2008 19:37:23 +0100 |

Recall that Jon gave an example of data in wide form "For example, say I have 5 years of data and the following information for 5 observations: ObsA 1 1 1 1 0 ObsB 1 1 1 0 0 ObsC 1 1 0 0 1 ObsD 1 0 1 1 1 ObsE 0 0 0 0 1 The variable names are something like yr1, yr2, yr3, yr4, yr5." And is interested in spells of 1s and 0s. We are thus looking for spells rowwise. A better approach than my last suggestion would be based on collapsing each row to its pattern, defined in the following way: 1 1 1 1 1 is a pattern "1" 0 0 0 0 0 is a pattern "0" A and B above are examples of pattern "10", etc. This would be done with a loop: gen pattern = "" qui forval i = 1/5 { replace pattern = pattern + string(yr`i') if string(yr`i') != substr(pattern, -1, 1) } substr(pattern, -1, 1) is the last character of -pattern-. Notice that all works well when the loop starts, as substr("", -1, 1) is empty and so string(yr1) will certainly differ from that. With this reduction Jon can just select observations with a pattern of "10". Nick n.j.cox@durham.ac.uk Nick Cox gen lastpos = . quietly forval i = 1/5 { replace lastpos = `i' if yr`i' == 1 } You can also collapse each string to its positive occurrences. gen signature = "" forval i = 1/5 { replace signature = signature + "`i'" if yr`i' == 1 } The last positive is then the last character of that signature. However, this is awkward whenever there are all zeros. Jon Schwabish Sergiy & Nick, Thanks for your responses; they were both very helpful. And yes, Nick is correct, Observation A does not follow the criterion I laid out. Nonetheless.... I was wondering if you might have some thoughts on a follow-up question. How would you go about identifying the *last* non-zero observation in the series? In other words, a series of numbers such as "111000", how would you identify the third number in the sequence (or even the fourth as the *first* zero observation). * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**FW: st: RE: identifying a string of consecutive numbers***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: svy, bootstrapping and pweights** - Next by Date:
**RE: Re: st: Dependent continuous variable with bounded range** - Previous by thread:
**FW: st: RE: identifying a string of consecutive numbers** - Next by thread:
**st: re: how to run zandrews as a postestimation command** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |