Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Observations that keep a feature...

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: Observations that keep a feature...
Date	Fri, 24 May 2013 02:32:04 +0100

The idea of a spell in modest generality may be of help to you.
Setting aside what I learned of spells at a moderately well known
school in northern Britain, the program -tsspell- (SSC) and the 2007
article

SJ-7-2  dm0029  . . . . . . . . . . . . . . Speaking Stata: Identifying spells
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q2/07   SJ 7(2):249--265                                 (no commands)
        shows how to handle spells with complete control over
        spell specification
http://www.stata-journal.com/sjpdf.html?articlenum=dm0029

provide independent accounts of some ideas for identifying spells.

Two kinds of spells seem relevant to your problem:

1. A spell initiated by a particular event (e.g. a change of government).

2. A spell defined by some condition being true throughout its length
(e.g. a rainy spell is one in which it rained every day).

I would look at -tsspell- first and then the SJ article.

Nick
[email protected]


On 23 May 2013 21:03, Miguel Angel Duran Munoz <[email protected]> wrote:
> Nick, this is what I am doing. I have a large sample, with about 7,000
> agents per period and about 60 periods. I am analyzing whether agents
> imitate each other. Once I have (statistically) confirmed that there is an
> imitation process going on, the next step is to analyze the differences
> between types of agents. In particular, between innovators (those who
> start following a rule of behavior right at the beginning of the process),
> nonadopters (those who never adotp the rule) and laggards (those who adopt
> the rule at a late period).
>
> This is why I need to split the sample the way I have described. I hope
> this helps to make it clear what I am doing. Thanks in advance.
>
>> This is getting very intricate to follow.
>>
>> As Sarah posted yesterday, more or less, we need examples.
>>
>> I worry on your behalf that you will have to explain your rules to
>> somebody reviewing your thesis/dissertation/report/paper and they are
>> going to ask you why you couldn't use much simpler rules.
>>
>> Nick
>> [email protected]
>>
>>
>> On 23 May 2013 18:43, Miguel Angel Duran Munoz <[email protected]> wrote:
>>> Nick and Sarah, thanks to your help I've been able to solve all but one
>>> of
>>> my problems. To select agents that are above the threshold after period
>>> 2,
>>> I've finally used:
>>>
>>> egen firstperiod = min(period), by(agent)
>>> drop if firstperiod > 2
>>> bysort agent (period): gen first2 = _n < 3
>>> egen min_rest = min(score / !first2), by(agent)
>>> keep if min_rest >= 0.9
>>>
>>> (the max condition that Nick suggested me is, I think, unnecessary)
>>>
>>> Nevertheless, I am not sure about how to select agents that overpass the
>>> threshold in the final periods (say at or after t3) and maintain over
>>> it.
>>> In principle, based on your suggestions, I thought of this:
>>>
>>> bysort agent (period): gen last=score[_N]
>>> bysort entity (date2): gen first2 = _n < 3
>>> egen min_rest = min(score / !first2), by(agent)
>>> keep if last>=0.9 & min_rest<=0.9
>>>
>>> Nevertheless, this implies that I am excluding agents that satisfy the
>>> criterion (overpassing the threshold at or after t3) but appear in the
>>> sample at an intermediate period.
>>>
>>> Will someone please help to solve this? Thanks in advance.
>>>
>>> Miguel.
>>>
>>>> Sarah, thank you for your help. I am very sorry for not having put my
>>>> doubts in a sufficiently clear way. And given what you say about the
>>>> way
>>>> data is stored I have realized that there might be other problems
>>>> around.
>>>> I will try to be as clear as possible.
>>>>
>>>> My data is in panel data form. I write the example down again in the
>>>> way
>>>> my data is stored. As regards the example in my previous messages, I
>>>> add
>>>> two agents (6 and 7). Please note also that data referring to agent
>>>> fifth
>>>> is missing in some periods, but there is no line corresponding to those
>>>> periods (this is what I had not taken into account so far):
>>>>
>>>> time  agent   score
>>>> t1     1      0.8
>>>> t2     1      1
>>>> t3     1      1
>>>> t4     1      1
>>>> t5     1      1
>>>> t6     1      1
>>>>
>>>> t1     2      0.8
>>>> t2     2      0.8
>>>> t3     2      1
>>>> t4     2      1
>>>> t5     2      1
>>>> t6     2      1
>>>>
>>>> t1     3      0.8
>>>> t2     3      0.8
>>>> t3     3      0.8
>>>> t4     3      1
>>>> t5     3      1
>>>> t6     3      1
>>>>
>>>> t1     4      0.8
>>>> t2     4      0.8
>>>> t3     4      0.8
>>>> t4     4      0.8
>>>> t5     4      1
>>>> t6     4      1
>>>>
>>>> t6     5      1
>>>>
>>>> t1     6      0.8
>>>> t2     6      0.8
>>>> t3     6      0.8
>>>> t4     6      0.8
>>>> t5     6      1
>>>> t6     6      1
>>>>
>>>> t1     7      0.8
>>>> t2     7      1
>>>> t3     7      1
>>>> t4     7      0.8
>>>> t5     7      0.8
>>>> t6     7      1
>>>>
>>>> Having said that, I want to split the sample in different ways. First,
>>>> I
>>>> want to focus on agents that overpass a threshold (eg, 0.9) since the
>>>> first period and are always above the threhold (ie, agent 1). Second, I
>>>> want to take agents that overpass the threshold at or before a
>>>> particular
>>>> period (eg, t3) and since then they are above the threshold (ie, agents
>>>> 1-4). Third, agents that overpass the threshold at or after a
>>>> particular
>>>> period (eg, t5) and since then they are above the threshold (ie, agents
>>>> 5
>>>> and 6). Please note that agent 7 is not included in any of the previous
>>>> subsamples.
>>>>
>>>> Thank you very much for your help. And once again, I am sorry for not
>>>> having been clear enough.
>>>>
>>>> Miguel.
>>>>
>>>>
>>>>
>>>>
>>>>> Miguel,
>>>>> This discussion would be clearer if your examples actually made it
>>>>> clear
>>>>> exactly what your data looks like.
>>>>>
>>>>> Your example below looks like you have data in wide form.  The
>>>>> solution
>>>>> that Nick suggested is for data in long form.  It's easy enough to
>>>>> move
>>>>> between the two, but it's hard to make concrete suggestions about how
>>>>> to
>>>>> proceed when we don't know what the actual data looks like.
>>>>>
>>>>> I'll start by assuming, as Nick does, that your data is actually in
>>>>> long
>>>>> form and you have three variables: agent, period, score.  I'll further
>>>>> assume that for agent 5 you simply have no records for periods 1-5
>>>>> (that
>>>>> is, you do not have records for those periods with missing values for
>>>>> score).  If that's true, you can simply calculate the first period
>>>>> that
>>>>> appears in the data and use that as part of your inclusion criteria.
>>>>> Something like the following will keep only those agents who first
>>>>> appear
>>>>> in the data before period 4:
>>>>> egen firstperiod=min(period), by(agent)
>>>>> drop if firstperiod>4
>>>>>
>>>>> Or maybe you only want to include agents who start in period 1?  It's
>>>>> unclear from your question.  In that case you'd -drop if
>>>>> firstperiod>1-
>>>>>
>>>>> For your second example, trying to look at the last time periods, I
>>>>> think
>>>>> you need to clarify what your actual criteria is.  You say "I would
>>>>> like
>>>>> to select those agents that overpass the threshold of 0.9 in any the
>>>>> last
>>>>> two periods and are over the threshold until the end of the sample
>>>>> period
>>>>> (ie, agents 4 and 5)."  To my eye, that criteria includes all agents
>>>>> except agent 6.  You're unlikely to get the results you hope for
>>>>> unless
>>>>> you are precise in the criteria you're using.
>>>>>
>>>>> Hope that helps.
>>>>>
>>>>> -Sarah
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: [email protected]
>>>>> [mailto:[email protected]] On Behalf Of Miguel
>>>>> Angel
>>>>> Duran Munoz
>>>>> Sent: Wednesday, May 22, 2013 11:00 AM
>>>>> To: [email protected]
>>>>> Subject: Re: st: Observations that keep a feature... an additional
>>>>> problem
>>>>>
>>>>> I use the same example than in a previous message, but I add a fifth
>>>>> agent
>>>>> that joins in period six:
>>>>>
>>>>>
>>>>> Agent 1: 1    1    1    1    1    1...
>>>>> Agent 2: 0.8  1    1    1    1    1...
>>>>> Agent 3: 0.8  0.8  0.8  1    1    1...
>>>>> Agent 4: 0.8  0.8  0.8  0.8  1    1...
>>>>> Agent 5:  .    .    .    .   .    1...
>>>>>
>>>>> I want to keep just the first three agents.
>>>>>
>>>>>
>>>>> If you don't mind, Nick, I would also like to ask you the following. I
>>>>> take the same example, but I focus on the last periods.
>>>>>
>>>>> Agent 1: ...1    1    1    1    1    1
>>>>> Agent 2: ...0.8  1    1    1    1    1
>>>>> Agent 3: ...0.8  0.8  0.8  1    1    1
>>>>> Agent 4: ...0.8  0.8  0.8  0.8  1    1
>>>>> Agent 5: ... .    .    .    .   .    1
>>>>> Agent 6: ...0.8  0.8  0.8  0.8  1    0.8
>>>>>
>>>>> I would like to select those agents that overpass the threshold of 0.9
>>>>> in
>>>>> any the last two periods and are over the threshold until the end of
>>>>> the
>>>>> sample period (ie, agents 4 and 5).
>>>>> I have tried to modify the commands that you have suggested me before,
>>>>> but
>>>>> I have not been able to get the right selection. Would you mind
>>>>> helping
>>>>> me
>>>>> with this? Thank you very much.
>>>>>
>>>>>> I can't follow this.  I see only "the rules select too many agents".
>>>>>>
>>>>>> You tell me your precise rules and I will try to think of code to
>>>>>> implement them.
>>>>>>
>>>>>> Nick
>>>>>> [email protected]
>>>>>>
>>>>>>
>>>>>> On 22 May 2013 18:16, Miguel Angel Duran Munoz <[email protected]>
>>>>>> wrote:
>>>>>>> Nick, after reducing the sample using your suggestion, I have
>>>>>>> checked
>>>>>>> the number of agents that there are per period. And the number is
>>>>>>> increasing in time. I guess this is due to the fact that agents
>>>>>>> joining the sample as time goes by and satisfying the requirement of
>>>>>>> being above the threshold are not excluded. Is there any trick to
>>>>>>> avoid including them? Thanks again.
>>>>>>>
>>>>>>>> Assuming variable names
>>>>>>>>
>>>>>>>> agent  period  score
>>>>>>>>
>>>>>>>> it seems that you want something like
>>>>>>>>
>>>>>>>> bysort agent (period) : gen first3 = _n < 4
>>>>>>>>
>>>>>>>> egen max_first3 = max(score / first3), by(agent)
>>>>>>>>
>>>>>>>> egen min_rest = min(score / !first3), by(agent)
>>>>>>>>
>>>>>>>> keep if max_first3 > 0.9 & min_rest > 0.9
>>>>>>>>
>>>>>>>> For the division trick in the -egen- call see e.g.
>>>>>>>>
>>>>>>>> http://www.stata.com/statalist/archive/2013-03/msg00917.html
>>>>>>>>
>>>>>>>> (reference included therein).
>>>>>>>>
>>>>>>>> Nick
>>>>>>>> [email protected]
>>>>>>>>
>>>>>>>>
>>>>>>>> On 22 May 2013 15:03, Miguel Angel Duran Munoz <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> Nick, thanks for your help. I hope you can help me with another
>>>>>>>>> doubt.
>>>>>>>>> For
>>>>>>>>> a similar analysis to that of my first message, assume I want to
>>>>>>>>> keep those agents that that have overpass the threshold before a
>>>>>>>>> certain period and then have been over it in the rest of the
>>>>>>>>> sample
>>>>>>>>> period.
>>>>>>>>>
>>>>>>>>> To illustrate the idea, consider the following (data refer to
>>>>>>>>> consecutive periods and the threshold is, eg, 0.9):
>>>>>>>>>
>>>>>>>>> Agent 1: 1    1    1    1    1...
>>>>>>>>> Agent 2: 0.8  1    1    1    1...
>>>>>>>>> Agent 3: 0.8  0.8  0.8  1    1...
>>>>>>>>> Agent 4: 0.8  0.8  0.8  0.8  1...
>>>>>>>>>
>>>>>>>>> I want to keep the first three agents because they have overpassed
>>>>>>>>> the threshold before period 4 and then have been over the
>>>>>>>>> threshold
>>>>>>>>> in the rest of the sample period, but I do not want to keep agent
>>>>>>>>> 4.
>>>>>>>>>
>>>>>>>>> Thanks in advance.
>>>>>>>>>
>>>>>>>>> Miguel.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Correct on -keep-. Sorry about that.
>>>>>>>>>>
>>>>>>>>>> The -sort- order
>>>>>>>>>>
>>>>>>>>>> bysort entity (const_a) :
>>>>>>>>>>
>>>>>>>>>> ensures that -const_a[1]- is the lowest for each agent, not the
>>>>>>>>>> first.
>>>>>>>>>> If the lowest value for each agent is above the threshold, then
>>>>>>>>>> all the observations for that agent  are above.
>>>>>>>>>> Nick
>>>>>>>>>> [email protected]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 21 May 2013 23:16, Miguel Angel Duran Munoz <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>> Thanks, Nick. I guess you mean -keep- instead of -drop-.
>>>>>>>>>>> Nevertheless,
>>>>>>>>>>> the
>>>>>>>>>>> command that you suggest would not guarantee that I keep the
>>>>>>>>>>> agents that have been above the threhsold for the whole sample
>>>>>>>>>>> period (ie, I would be including agents that were above the
>>>>>>>>>>> threshold in the first period and then might have been above or
>>>>>>>>>>> below it).
>>>>>>>>>>>
>>>>>>>>>>>> Sounds like
>>>>>>>>>>>>
>>>>>>>>>>>> bysort entity (const_a) : drop if const_a[1] > 0.09716
>>>>>>>>>>>>
>>>>>>>>>>>> Nick
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>
>>>>>>>>>>>> On 21 May 2013 23:01, Miguel Angel Duran Munoz <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi, Statalisters. I want to focus on agents in my dataset that
>>>>>>>>>>>>> have a particular feature; specifically, for those agents, and
>>>>>>>>>>>>> for each and every period (out of 64), the value of a variable
>>>>>>>>>>>>> (const_a) is larger than a particular threshold (0.097116). I
>>>>>>>>>>>>> have done what I show below.
>>>>>>>>>>>>> Nevertheless, I have realized that some of my agents are not
>>>>>>>>>>>>> in
>>>>>>>>>>>>> the sample since the first period, so what I am doing would
>>>>>>>>>>>>> mistakenly eliminate them. Will anyone help to solve this
>>>>>>>>>>>>> problem? Thanks in advance.
>>>>>>>>>>>>>
>>>>>>>>>>>>> bysort entity (date2): gen obs=_n drop if const_a<0.097116 by
>>>>>>>>>>>>> entity: drop if obs[_N]<64
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Observations that keep a feature in the whole sample period
  - From: "Miguel Angel Duran Munoz" <[email protected]>
- Re: st: Observations that keep a feature in the whole sample period
  - From: Nick Cox <[email protected]>
- Re: st: Observations that keep a feature in the whole sample period
  - From: "Miguel Angel Duran Munoz" <[email protected]>
- Re: st: Observations that keep a feature in the whole sample period
  - From: Nick Cox <[email protected]>
- Re: st: Observations that keep a feature in the whole sample period
  - From: "Miguel Angel Duran Munoz" <[email protected]>
- Re: st: Observations that keep a feature in the whole sample period
  - From: Nick Cox <[email protected]>
- Re: st: Observations that keep a feature... an additional problem
  - From: "Miguel Angel Duran Munoz" <[email protected]>
- Re: st: Observations that keep a feature... an additional problem
  - From: Nick Cox <[email protected]>
- Re: st: Observations that keep a feature... an additional problem
  - From: "Miguel Angel Duran Munoz" <[email protected]>
- RE: st: Observations that keep a feature... an additional problem
  - From: "Sarah Edgington" <[email protected]>
- RE: st: Observations that keep a feature... an additional problem
  - From: "Miguel Angel Duran Munoz" <[email protected]>
- RE: st: Observations that keep a feature...
  - From: "Miguel Angel Duran Munoz" <[email protected]>
- Re: st: Observations that keep a feature...
  - From: Nick Cox <[email protected]>
- Re: st: Observations that keep a feature...
  - From: "Miguel Angel Duran Munoz" <[email protected]>

Prev by Date: Re: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
Next by Date: RE: st: Rabe-Hesketh's gllamm: multivariate multilevel dropout model
Previous by thread: Re: st: Observations that keep a feature...
Next by thread: Re: st: Observations that keep a feature... an additional problem
Index(es):
- Date
- Thread