Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

The idea of a spell in modest generality may be of help to you. Setting aside what I learned of spells at a moderately well known school in northern Britain, the program -tsspell- (SSC) and the 2007 article SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q2/07 SJ 7(2):249--265 (no commands) shows how to handle spells with complete control over spell specification http://www.stata-journal.com/sjpdf.html?articlenum=dm0029 provide independent accounts of some ideas for identifying spells. Two kinds of spells seem relevant to your problem: 1. A spell initiated by a particular event (e.g. a change of government). 2. A spell defined by some condition being true throughout its length (e.g. a rainy spell is one in which it rained every day). I would look at -tsspell- first and then the SJ article. Nick njcoxstata@gmail.com On 23 May 2013 21:03, Miguel Angel Duran Munoz <maduran@uma.es> wrote: > Nick, this is what I am doing. I have a large sample, with about 7,000 > agents per period and about 60 periods. I am analyzing whether agents > imitate each other. Once I have (statistically) confirmed that there is an > imitation process going on, the next step is to analyze the differences > between types of agents. In particular, between innovators (those who > start following a rule of behavior right at the beginning of the process), > nonadopters (those who never adotp the rule) and laggards (those who adopt > the rule at a late period). > > This is why I need to split the sample the way I have described. I hope > this helps to make it clear what I am doing. Thanks in advance. > >> This is getting very intricate to follow. >> >> As Sarah posted yesterday, more or less, we need examples. >> >> I worry on your behalf that you will have to explain your rules to >> somebody reviewing your thesis/dissertation/report/paper and they are >> going to ask you why you couldn't use much simpler rules. >> >> Nick >> njcoxstata@gmail.com >> >> >> On 23 May 2013 18:43, Miguel Angel Duran Munoz <maduran@uma.es> wrote: >>> Nick and Sarah, thanks to your help I've been able to solve all but one >>> of >>> my problems. To select agents that are above the threshold after period >>> 2, >>> I've finally used: >>> >>> egen firstperiod = min(period), by(agent) >>> drop if firstperiod > 2 >>> bysort agent (period): gen first2 = _n < 3 >>> egen min_rest = min(score / !first2), by(agent) >>> keep if min_rest >= 0.9 >>> >>> (the max condition that Nick suggested me is, I think, unnecessary) >>> >>> Nevertheless, I am not sure about how to select agents that overpass the >>> threshold in the final periods (say at or after t3) and maintain over >>> it. >>> In principle, based on your suggestions, I thought of this: >>> >>> bysort agent (period): gen last=score[_N] >>> bysort entity (date2): gen first2 = _n < 3 >>> egen min_rest = min(score / !first2), by(agent) >>> keep if last>=0.9 & min_rest<=0.9 >>> >>> Nevertheless, this implies that I am excluding agents that satisfy the >>> criterion (overpassing the threshold at or after t3) but appear in the >>> sample at an intermediate period. >>> >>> Will someone please help to solve this? Thanks in advance. >>> >>> Miguel. >>> >>>> Sarah, thank you for your help. I am very sorry for not having put my >>>> doubts in a sufficiently clear way. And given what you say about the >>>> way >>>> data is stored I have realized that there might be other problems >>>> around. >>>> I will try to be as clear as possible. >>>> >>>> My data is in panel data form. I write the example down again in the >>>> way >>>> my data is stored. As regards the example in my previous messages, I >>>> add >>>> two agents (6 and 7). Please note also that data referring to agent >>>> fifth >>>> is missing in some periods, but there is no line corresponding to those >>>> periods (this is what I had not taken into account so far): >>>> >>>> time agent score >>>> t1 1 0.8 >>>> t2 1 1 >>>> t3 1 1 >>>> t4 1 1 >>>> t5 1 1 >>>> t6 1 1 >>>> >>>> t1 2 0.8 >>>> t2 2 0.8 >>>> t3 2 1 >>>> t4 2 1 >>>> t5 2 1 >>>> t6 2 1 >>>> >>>> t1 3 0.8 >>>> t2 3 0.8 >>>> t3 3 0.8 >>>> t4 3 1 >>>> t5 3 1 >>>> t6 3 1 >>>> >>>> t1 4 0.8 >>>> t2 4 0.8 >>>> t3 4 0.8 >>>> t4 4 0.8 >>>> t5 4 1 >>>> t6 4 1 >>>> >>>> t6 5 1 >>>> >>>> t1 6 0.8 >>>> t2 6 0.8 >>>> t3 6 0.8 >>>> t4 6 0.8 >>>> t5 6 1 >>>> t6 6 1 >>>> >>>> t1 7 0.8 >>>> t2 7 1 >>>> t3 7 1 >>>> t4 7 0.8 >>>> t5 7 0.8 >>>> t6 7 1 >>>> >>>> Having said that, I want to split the sample in different ways. First, >>>> I >>>> want to focus on agents that overpass a threshold (eg, 0.9) since the >>>> first period and are always above the threhold (ie, agent 1). Second, I >>>> want to take agents that overpass the threshold at or before a >>>> particular >>>> period (eg, t3) and since then they are above the threshold (ie, agents >>>> 1-4). Third, agents that overpass the threshold at or after a >>>> particular >>>> period (eg, t5) and since then they are above the threshold (ie, agents >>>> 5 >>>> and 6). Please note that agent 7 is not included in any of the previous >>>> subsamples. >>>> >>>> Thank you very much for your help. And once again, I am sorry for not >>>> having been clear enough. >>>> >>>> Miguel. >>>> >>>> >>>> >>>> >>>>> Miguel, >>>>> This discussion would be clearer if your examples actually made it >>>>> clear >>>>> exactly what your data looks like. >>>>> >>>>> Your example below looks like you have data in wide form. The >>>>> solution >>>>> that Nick suggested is for data in long form. It's easy enough to >>>>> move >>>>> between the two, but it's hard to make concrete suggestions about how >>>>> to >>>>> proceed when we don't know what the actual data looks like. >>>>> >>>>> I'll start by assuming, as Nick does, that your data is actually in >>>>> long >>>>> form and you have three variables: agent, period, score. I'll further >>>>> assume that for agent 5 you simply have no records for periods 1-5 >>>>> (that >>>>> is, you do not have records for those periods with missing values for >>>>> score). If that's true, you can simply calculate the first period >>>>> that >>>>> appears in the data and use that as part of your inclusion criteria. >>>>> Something like the following will keep only those agents who first >>>>> appear >>>>> in the data before period 4: >>>>> egen firstperiod=min(period), by(agent) >>>>> drop if firstperiod>4 >>>>> >>>>> Or maybe you only want to include agents who start in period 1? It's >>>>> unclear from your question. In that case you'd -drop if >>>>> firstperiod>1- >>>>> >>>>> For your second example, trying to look at the last time periods, I >>>>> think >>>>> you need to clarify what your actual criteria is. You say "I would >>>>> like >>>>> to select those agents that overpass the threshold of 0.9 in any the >>>>> last >>>>> two periods and are over the threshold until the end of the sample >>>>> period >>>>> (ie, agents 4 and 5)." To my eye, that criteria includes all agents >>>>> except agent 6. You're unlikely to get the results you hope for >>>>> unless >>>>> you are precise in the criteria you're using. >>>>> >>>>> Hope that helps. >>>>> >>>>> -Sarah >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: owner-statalist@hsphsun2.harvard.edu >>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Miguel >>>>> Angel >>>>> Duran Munoz >>>>> Sent: Wednesday, May 22, 2013 11:00 AM >>>>> To: statalist@hsphsun2.harvard.edu >>>>> Subject: Re: st: Observations that keep a feature... an additional >>>>> problem >>>>> >>>>> I use the same example than in a previous message, but I add a fifth >>>>> agent >>>>> that joins in period six: >>>>> >>>>> >>>>> Agent 1: 1 1 1 1 1 1... >>>>> Agent 2: 0.8 1 1 1 1 1... >>>>> Agent 3: 0.8 0.8 0.8 1 1 1... >>>>> Agent 4: 0.8 0.8 0.8 0.8 1 1... >>>>> Agent 5: . . . . . 1... >>>>> >>>>> I want to keep just the first three agents. >>>>> >>>>> >>>>> If you don't mind, Nick, I would also like to ask you the following. I >>>>> take the same example, but I focus on the last periods. >>>>> >>>>> Agent 1: ...1 1 1 1 1 1 >>>>> Agent 2: ...0.8 1 1 1 1 1 >>>>> Agent 3: ...0.8 0.8 0.8 1 1 1 >>>>> Agent 4: ...0.8 0.8 0.8 0.8 1 1 >>>>> Agent 5: ... . . . . . 1 >>>>> Agent 6: ...0.8 0.8 0.8 0.8 1 0.8 >>>>> >>>>> I would like to select those agents that overpass the threshold of 0.9 >>>>> in >>>>> any the last two periods and are over the threshold until the end of >>>>> the >>>>> sample period (ie, agents 4 and 5). >>>>> I have tried to modify the commands that you have suggested me before, >>>>> but >>>>> I have not been able to get the right selection. Would you mind >>>>> helping >>>>> me >>>>> with this? Thank you very much. >>>>> >>>>>> I can't follow this. I see only "the rules select too many agents". >>>>>> >>>>>> You tell me your precise rules and I will try to think of code to >>>>>> implement them. >>>>>> >>>>>> Nick >>>>>> njcoxstata@gmail.com >>>>>> >>>>>> >>>>>> On 22 May 2013 18:16, Miguel Angel Duran Munoz <maduran@uma.es> >>>>>> wrote: >>>>>>> Nick, after reducing the sample using your suggestion, I have >>>>>>> checked >>>>>>> the number of agents that there are per period. And the number is >>>>>>> increasing in time. I guess this is due to the fact that agents >>>>>>> joining the sample as time goes by and satisfying the requirement of >>>>>>> being above the threshold are not excluded. Is there any trick to >>>>>>> avoid including them? Thanks again. >>>>>>> >>>>>>>> Assuming variable names >>>>>>>> >>>>>>>> agent period score >>>>>>>> >>>>>>>> it seems that you want something like >>>>>>>> >>>>>>>> bysort agent (period) : gen first3 = _n < 4 >>>>>>>> >>>>>>>> egen max_first3 = max(score / first3), by(agent) >>>>>>>> >>>>>>>> egen min_rest = min(score / !first3), by(agent) >>>>>>>> >>>>>>>> keep if max_first3 > 0.9 & min_rest > 0.9 >>>>>>>> >>>>>>>> For the division trick in the -egen- call see e.g. >>>>>>>> >>>>>>>> http://www.stata.com/statalist/archive/2013-03/msg00917.html >>>>>>>> >>>>>>>> (reference included therein). >>>>>>>> >>>>>>>> Nick >>>>>>>> njcoxstata@gmail.com >>>>>>>> >>>>>>>> >>>>>>>> On 22 May 2013 15:03, Miguel Angel Duran Munoz <maduran@uma.es> >>>>>>>> wrote: >>>>>>>>> Nick, thanks for your help. I hope you can help me with another >>>>>>>>> doubt. >>>>>>>>> For >>>>>>>>> a similar analysis to that of my first message, assume I want to >>>>>>>>> keep those agents that that have overpass the threshold before a >>>>>>>>> certain period and then have been over it in the rest of the >>>>>>>>> sample >>>>>>>>> period. >>>>>>>>> >>>>>>>>> To illustrate the idea, consider the following (data refer to >>>>>>>>> consecutive periods and the threshold is, eg, 0.9): >>>>>>>>> >>>>>>>>> Agent 1: 1 1 1 1 1... >>>>>>>>> Agent 2: 0.8 1 1 1 1... >>>>>>>>> Agent 3: 0.8 0.8 0.8 1 1... >>>>>>>>> Agent 4: 0.8 0.8 0.8 0.8 1... >>>>>>>>> >>>>>>>>> I want to keep the first three agents because they have overpassed >>>>>>>>> the threshold before period 4 and then have been over the >>>>>>>>> threshold >>>>>>>>> in the rest of the sample period, but I do not want to keep agent >>>>>>>>> 4. >>>>>>>>> >>>>>>>>> Thanks in advance. >>>>>>>>> >>>>>>>>> Miguel. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Correct on -keep-. Sorry about that. >>>>>>>>>> >>>>>>>>>> The -sort- order >>>>>>>>>> >>>>>>>>>> bysort entity (const_a) : >>>>>>>>>> >>>>>>>>>> ensures that -const_a[1]- is the lowest for each agent, not the >>>>>>>>>> first. >>>>>>>>>> If the lowest value for each agent is above the threshold, then >>>>>>>>>> all the observations for that agent are above. >>>>>>>>>> Nick >>>>>>>>>> njcoxstata@gmail.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 21 May 2013 23:16, Miguel Angel Duran Munoz <maduran@uma.es> >>>>>>>>>> wrote: >>>>>>>>>>> Thanks, Nick. I guess you mean -keep- instead of -drop-. >>>>>>>>>>> Nevertheless, >>>>>>>>>>> the >>>>>>>>>>> command that you suggest would not guarantee that I keep the >>>>>>>>>>> agents that have been above the threhsold for the whole sample >>>>>>>>>>> period (ie, I would be including agents that were above the >>>>>>>>>>> threshold in the first period and then might have been above or >>>>>>>>>>> below it). >>>>>>>>>>> >>>>>>>>>>>> Sounds like >>>>>>>>>>>> >>>>>>>>>>>> bysort entity (const_a) : drop if const_a[1] > 0.09716 >>>>>>>>>>>> >>>>>>>>>>>> Nick >>>>>>>>>>>> njcoxstata@gmail.com >>>>>>>>>>>> >>>>>>>>>>>> On 21 May 2013 23:01, Miguel Angel Duran Munoz <maduran@uma.es> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> Hi, Statalisters. I want to focus on agents in my dataset that >>>>>>>>>>>>> have a particular feature; specifically, for those agents, and >>>>>>>>>>>>> for each and every period (out of 64), the value of a variable >>>>>>>>>>>>> (const_a) is larger than a particular threshold (0.097116). I >>>>>>>>>>>>> have done what I show below. >>>>>>>>>>>>> Nevertheless, I have realized that some of my agents are not >>>>>>>>>>>>> in >>>>>>>>>>>>> the sample since the first period, so what I am doing would >>>>>>>>>>>>> mistakenly eliminate them. Will anyone help to solve this >>>>>>>>>>>>> problem? Thanks in advance.

bysort entity (date2): gen obs=_n drop if const_a<0.097116 by entity: drop if obs[_N]<64

