Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Finding patterns of consecutive number


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Finding patterns of consecutive number
Date   Thu, 26 Apr 2012 17:25:18 +0100

You are quite right. That FAQ addresses a related but not identical problem. The _time_ variable cannot be repeated within a panel defined by -tsset-, but other variables can be. 

As many of us remember from our time at Hogwarts, it can help to think in terms of spells. You want spells of consecutive integers. That is a little tricky as the "obvious" defining condition 

test == test[_n-1] + 1

or 

this value == previous value + 1 

in fact won't be satisfied by the first observation in such a spell (as if it were true of the first such observation, it _could not be_ the first observation). So, you can flip it round and say what is the condition defining the first observation of such a spell, and it is the converse of the equality just given. 

Nick 
n.j.cox@durham.ac.uk 


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Marshall Garland
Sent: 26 April 2012 17:04
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Finding patterns of consecutive number

Hi Nick-

Thanks for the simplified code and the links. This was far easier and
more intuitive than the programming somersaults that I was attempting.

I had read this FAQ, but I was conceptually struggling with how to
apply it to my circumstance.

Cheers,

-mwg

On Thu, Apr 26, 2012 at 3:45 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> The code will simplify as
>
> if _n == 1 | (test - test[_n-1] != 1)
>
> could be written
>
> if (test - test[_n-1] != 1)
>
> because -test[0]- will be evaluated as missing. But in practice with
> spell problems, the first observation in a panel often needs explicit
> attention as we know nothing about what preceded it. And code that
> deals explicitly with the first observation is often easier to
> understand.
>
> This may also be of interest:
>
> FAQ     . . . . . . Identifying runs of consecutive observations in panel data
>        . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and V. Wiggins
>        8/02    How do I identify runs of consecutive observations
>                in panel data?
>                http://www.stata.com/support/faqs/data/panel.html
>
> On Thu, Apr 26, 2012 at 2:58 AM, Marshall Garland
> <marshall.w.garland@gmail.com> wrote:
>
>> This is exactly what I needed.
>>
>> Thanks so much for your help and prompt reply.
>
> On Wed, Apr 25, 2012 at 8:24 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>
>>> I think of your problem as defining spells of consecutive integers, so
>>> that a spell starts with the first observation in each panel or if the
>>> previous value was not one fewer.
>>>
>>> bysort id (year) : gen progress = string(test) if _n == 1 | (test -
>>> test[_n-1] != 1)
>>> by id : replace progress = progress[_n-1] + string(test) if missing(progress)
>>>
>>> Dealing with spells: see also -tsspell- (SSC) or
>>>
>>> SJ-7-2  dm0029  . . . . . . . . . . . . . . Speaking Stata: Identifying spells
>>>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>>>        Q2/07   SJ 7(2):249--265                                 (no commands)
>>>        shows how to handle spells with complete control over
>>>        spell specification
>>>
>>> By the way, as the putative author of -tostring-, I note that it is
>>> overkill here. The -string()- function is all you need.
>
> On Wed, Apr 25, 2012 at 11:47 PM, Marshall Garland
>>> <marshall.w.garland@gmail.com> wrote:
>>>
>>>> I have panel student testing data spanning six years. Each year, I
>>>> have a unique student and student test level and outcome. Testing
>>>> levels across years are not necessarily consecutive, nor are years.
>>>> For each student, in each year, I'd like to create a variable that
>>>> captures the longitudinal test progression for each student, in each
>>>> year. However, for each year, I'd like the maximum consecutive test
>>>> progression, without disruptions. This maximum test progression should
>>>> only be calculated for consecutive years, too.
>>>>
>>>> I've posted my data at the end of this message, which will help
>>>> describe my objective. For student A, in 2008/09, her test progression
>>>> is 6543, since she had 4 consecutive years of test data. This is
>>>> perfect. Student B, however, in 2008/09, has a test progression of
>>>> 7643. However, I only want to record, for student B, the maximum
>>>> consecutive test progression, which is 76 and ignore the 43. The 43
>>>> progression will be captured in the corresponding year (2006/07).
>>>>
>>>> I can't figure out a way to adjust for this discontinuity. I've tried
>>>> a number of things, including this. But, this still captures repeated
>>>> test levels across years (student C below, in 2008/09).
>>>>
>>>> Thanks for help in advance.
>>>>
>>>> Cheers,
>>>>
>>>> -mwg
>>>>
>>>> /****************************************************
>>>> bys research_id: gen test_t=d.test_level_2
>>>> bys research_id: egen max_test_t=max(test_t)
>>>>
>>>> ///group creation for consecutive runs
>>>> forvalues i=0/6 {
>>>>        gen group_`i'=.
>>>>        bys research_i (sch_yr): replace group_`i'=test_level_2[_n-`i'] if
>>>> max_test_t==1 & test_t==1
>>>>        tostring group_`i', replace
>>>>        replace group_`i'="" if group_`i'=="."
>>>> }
>>>>
>>>> egen group_ty_cons=concat(group_0- group_6)
>>>> tab group_ty_cons
>>>> /**************************************************************
>>>>
>>>> Here's my data:
>>>> student year    test_level      progression
>>>> A       2005/06 3
>>>> A       2006/07 4       43
>>>> A       2007/08 5       543
>>>> A       2008/09 6       6543
>>>> B       2005/06 3
>>>> B       2006/07 4       43
>>>> B       2007/08 6       643
>>>> B       2008/09 7       7643
>>>> C       2005/06 6
>>>> C       2006/07 7       76
>>>> C       2007/08 8       876
>>>> C       2008/09 8       8876
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index