Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <n.j.cox@durham.ac.uk> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: Finding patterns of consecutive number |
Date | Thu, 26 Apr 2012 17:10:55 +0100 |
Nick n.j.cox@durham.ac.uk -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Marshall Garland Sent: 26 April 2012 17:04 To: statalist@hsphsun2.harvard.edu Subject: Re: st: Finding patterns of consecutive number Hi Nick- Thanks for the simplified code and the links. This was far easier and more intuitive than the programming somersaults that I was attempting. I had read this FAQ, but I was conceptually struggling with how to apply it to my circumstance. Cheers, -mwg On Thu, Apr 26, 2012 at 3:45 AM, Nick Cox <njcoxstata@gmail.com> wrote: > The code will simplify as > > if _n == 1 | (test - test[_n-1] != 1) > > could be written > > if (test - test[_n-1] != 1) > > because -test[0]- will be evaluated as missing. But in practice with > spell problems, the first observation in a panel often needs explicit > attention as we know nothing about what preceded it. And code that > deals explicitly with the first observation is often easier to > understand. > > This may also be of interest: > > FAQ . . . . . . Identifying runs of consecutive observations in panel data > . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and V. Wiggins > 8/02 How do I identify runs of consecutive observations > in panel data? > http://www.stata.com/support/faqs/data/panel.html > > On Thu, Apr 26, 2012 at 2:58 AM, Marshall Garland > <marshall.w.garland@gmail.com> wrote: > >> This is exactly what I needed. >> >> Thanks so much for your help and prompt reply. > > On Wed, Apr 25, 2012 at 8:24 PM, Nick Cox <njcoxstata@gmail.com> wrote: > >>> I think of your problem as defining spells of consecutive integers, so >>> that a spell starts with the first observation in each panel or if the >>> previous value was not one fewer. >>> >>> bysort id (year) : gen progress = string(test) if _n == 1 | (test - >>> test[_n-1] != 1) >>> by id : replace progress = progress[_n-1] + string(test) if missing(progress) >>> >>> Dealing with spells: see also -tsspell- (SSC) or >>> >>> SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells >>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox >>> Q2/07 SJ 7(2):249--265 (no commands) >>> shows how to handle spells with complete control over >>> spell specification >>> >>> By the way, as the putative author of -tostring-, I note that it is >>> overkill here. The -string()- function is all you need. > > On Wed, Apr 25, 2012 at 11:47 PM, Marshall Garland >>> <marshall.w.garland@gmail.com> wrote: >>> >>>> I have panel student testing data spanning six years. Each year, I >>>> have a unique student and student test level and outcome. Testing >>>> levels across years are not necessarily consecutive, nor are years. >>>> For each student, in each year, I'd like to create a variable that >>>> captures the longitudinal test progression for each student, in each >>>> year. However, for each year, I'd like the maximum consecutive test >>>> progression, without disruptions. This maximum test progression should >>>> only be calculated for consecutive years, too. >>>> >>>> I've posted my data at the end of this message, which will help >>>> describe my objective. For student A, in 2008/09, her test progression >>>> is 6543, since she had 4 consecutive years of test data. This is >>>> perfect. Student B, however, in 2008/09, has a test progression of >>>> 7643. However, I only want to record, for student B, the maximum >>>> consecutive test progression, which is 76 and ignore the 43. The 43 >>>> progression will be captured in the corresponding year (2006/07). >>>> >>>> I can't figure out a way to adjust for this discontinuity. I've tried >>>> a number of things, including this. But, this still captures repeated >>>> test levels across years (student C below, in 2008/09). >>>> >>>> Thanks for help in advance. >>>> >>>> Cheers, >>>> >>>> -mwg >>>> >>>> /**************************************************** >>>> bys research_id: gen test_t=d.test_level_2 >>>> bys research_id: egen max_test_t=max(test_t) >>>> >>>> ///group creation for consecutive runs >>>> forvalues i=0/6 { >>>> gen group_`i'=. >>>> bys research_i (sch_yr): replace group_`i'=test_level_2[_n-`i'] if >>>> max_test_t==1 & test_t==1 >>>> tostring group_`i', replace >>>> replace group_`i'="" if group_`i'=="." >>>> } >>>> >>>> egen group_ty_cons=concat(group_0- group_6) >>>> tab group_ty_cons >>>> /************************************************************** >>>> >>>> Here's my data: >>>> student year test_level progression >>>> A 2005/06 3 >>>> A 2006/07 4 43 >>>> A 2007/08 5 543 >>>> A 2008/09 6 6543 >>>> B 2005/06 3 >>>> B 2006/07 4 43 >>>> B 2007/08 6 643 >>>> B 2008/09 7 7643 >>>> C 2005/06 6 >>>> C 2006/07 7 76 >>>> C 2007/08 8 876 >>>> C 2008/09 8 8876 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/