Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Looping within a subset under a certain condition


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Looping within a subset under a certain condition
Date   Sun, 30 Sep 2012 14:31:53 +0100

The code is testing whether every case of 0 is within all the windows
defined by cases of 1, which I thought was what you wanted.

That is not what you want, it seems.

If you are happy that a case of 0 is within at least one of the
windows defined by cases of 1, then the code is different.

sort firm rep trandate

gen long obsno = _n

* assume not in a window; will change our mind if we find an exception
gen in_a_window = 0

* numeric ids 1 2 3 ... are just a convenience for looping
egen firm_numid = group(firm_id)
su firm_numid, meanonly

* loop over firms
forval f =  1/`r(max)' {

* within each firm, which cases have rep == 0
su obsno if firm_numid == `f' & rep == 0, meanonly
local z1 = r(min)
local z2 = r(max)

* ditto, rep == 1
su obsno if firm_numid == `f' & rep == 1, meanonly
local o1 = r(min)
local o2 = r(max)

* look at each case of rep == 0
forval i = `z1'/`z2' {
        local isin = 0

                * we use the -trandate[`i'] and compare it with the
windows for each case of rep == 1
               forval o = `o1'/`o2' {
                if inrange[trandate[`i'], win_start[`o'], win_end[`o']) {
                        local isin = 1
                  }
        }

        if `isin' replace in_a_window = 1 in `i'
}

If you then want to check that _all_ cases of rep==0 for each firm_id
are within a window

egen all_in_window = min(in_a_window / (rep == 0)) , by(firm_id)

Nick

On Sun, Sep 30, 2012 at 2:05 PM, Gerard Solbrig
<gsolbrig@mail.uni-mannheim.de> wrote:
> (in reference to my mails before, concerning your and my code)
>
> I have given this some thought, why -rep_ins- is set to 0 for all
> observations, using your code.
>
> The loop runs over all rep = 1 cases and looks into whether the -trandate-
> lies within the range of each rep = 1 case.
> In case of multiple rep = 1 cases with very different dates, it might find
> one rep = 1 case in which's range the current rep = 0 observation's
> -trandate- lies. But the loop does not stop there, if it does find one.
> It keeps on going and due to the sorting of dates, it inevitably finds a
> later rep = 1 case, for which its -trandate- lies outside of the range and
> changes -rep_ins- to 0.
>
> Is there a way to tell the loop: stop as soon as you find that your
> -trandate- lies in the range of a (or any) rep = 1 case and jump on to the
> next rep = 0 case? If not, a loop might not even be the approach to this
> problem...
>
> Gerard
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
> Sent: Sonntag, 30. September 2012 12:48
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Looping within a subset under a certain condition
>
> Should be
>
> sort firm rep trandate
>
> Sorry!
>
> On Sun, Sep 30, 2012 at 11:27 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> You are not showing me the complete line you typed, so I can't tell
>> you what was wrong exactly.
>>
>> More positively, here is a stab at your problem, but I haven't tested the
> code.
>>
>> sort firm trandate rep
>>
>> gen long obsno = _n
>>
>> * assume all are in some window; will change our mind if we find an
>> exception gen all_in_a_window = 1
>>
>> * numeric ids 1 2 3 ... are just a convenience for looping egen
>> firm_numid = group(firm_id) su firm_numid, meanonly
>>
>> * loop over firms
>> forval f =  1/`r(max)' {
>>
>> * within each firm, which cases have rep == 0 su obsno if firm_numid
>> == `f' & rep == 0, meanonly local z1 = r(min) local z2 = r(max)
>>
>> * ditto, rep == 1
>> su obsno if firm_numid == `f' & rep == 1, meanonly local o1 = r(min)
>> local o2 = r(max)
>>
>> * look at each case of rep == 0
>> forval i = `z1'/`z2' {
>>         local allin = 1
>>
>>                 * we use the -trandate[`i'] and compare it with the
>> windows for each case of rep == 1
>>                 * note the crucial !    [!!!]
>>         forval o = `o1'/`o2' {
>>                 if !inrange[trandate[`i'], win_start[`o'], win_end[`o']) {
>>                         local allin = 0
>>                                 }
>>         }
>>
>>         if `allin' == 0 replace all_in_window = 0 in `i'
>> }
>>
>> }
>>
>> Nick
>>
>> On Sun, Sep 30, 2012 at 11:17 AM, Gerard Solbrig
>> <gsolbrig@mail.uni-mannheim.de> wrote:
>>> I understand. That's what I did in an earlier version of the loop,
>>> where I subscripted both, -rep- and -trandate- in my loop, but then Stata
> returned:
>>>
>>> '[' invalid obs no
>>> r(198);
>>>
>>> Why is that? That's why I got rid of it in the first place. But
>>> without the subscript, the loop does not seem to finish running.
>>>
>>>
>>> -----Original Message-----
>>> From: owner-statalist@hsphsun2.harvard.edu
>>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
>>> Sent: Sonntag, 30. September 2012 11:59
>>> To: statalist@hsphsun2.harvard.edu
>>> Subject: Re: st: Looping within a subset under a certain condition
>>>
>>> This can't be right, if only because you are misunderstanding what
>>> the
>>> -if- command does. Stata treats
>>>
>>> if rep == 1
>>>
>>> as if it were
>>>
>>> if rep[1] == 1
>>>
>>> See
>>>
>>> FAQ     . . . . . . . . . . . . . . . . . . . . .  if command vs. if
>>> qualifier
>>>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  J.
>>> Wernow
>>>         6/00    I have an if command in my program that only seems
>>>                 to evaluate the first observation, what's going on?
>>>
>>> http://www.stata.com/support/faqs/lang/ifqualifier.html
>>>
>>> The context of looping over observations makes no difference here.
>>> You probably intend
>>>
>>> if rep[`i'] == 1
>>>
>>> Similar comment w.r.t.
>>>
>>> if trandate ...
>>>
>>> where -trandate- _must_ be subscripted.
>>>
>>>
>>> On Sun, Sep 30, 2012 at 10:18 AM, Gerard Solbrig
>>> <gsolbrig@mail.uni-mannheim.de> wrote:
>>>> That sure is correct. Please see my reply to Pengpeng on that matter.
>>>> So far, I've only focused on getting the rep_ins indicator to work
>>>> at all, but multiple windows for one firm is an additional concern.
>>>> Ideally, a code would indicate for each rep = 0 case within which of
>>>> these windows the observation's 'trandate' lies...
>>>>
>>>> Here's the last version of my code (without inclusion of your
>>>> earlier suggestion and the multiple window problem):
>>>>
>>>> forvalues x = 1/`max' {
>>>>         summarize obs, meanonly
>>>>         local N = r(N)
>>>>         forvalues i = 1/`N' {
>>>>                 if rep == 1 {
>>>>                 local r = `i'
>>>>                 local s = `i'+1
>>>>                 forvalues z = `s'/`N' {
>>>>                         if trandate >= wind_start[`r'] & trandate <=
>>>> wind_end[`r'] {
>>>>                         replace rep_ins = 1 in [`z']
>>>>                         }
>>>>                         else {
>>>>                         replace rep_ins = 0 in [`z']
>>>>                         }
>>>>                 }
>>>>         }
>>>> }
>>>> }
>>>> replace rep_ins = . if rep == 1
>>>>
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: owner-statalist@hsphsun2.harvard.edu
>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
>>>> Sent: Sonntag, 30. September 2012 11:10
>>>> To: statalist@hsphsun2.harvard.edu
>>>> Subject: Re: st: Looping within a subset under a certain condition
>>>>
>>>> The other thing I wasn't clear on your rules for combining two or
>>>> more windows for the same firm. The code example I gave just uses
>>>> the overall range of the windows, but that would include any gaps
>>>> between windows. Thus if a < b < c < d and there are windows [a,b]
>>>> and [c,d] then the combined window [a, d] includes a gap [b, c].
>>>>
>>>> On Sun, Sep 30, 2012 at 9:56 AM, Gerard Solbrig
>>>> <gsolbrig@mail.uni-mannheim.de> wrote:
>>>>> My bad, sorry! Of course, the observation 5apr2004 should not be
>>>>> considered in the window, as it lies outside of the range between
>>>>> 'wind_start' and 'wind_end'. Despite, it seems you've understood my
>>>> problem correctly.
>>>>>
>>>>> I'll try to incorporate your suggestion into a solution and see
>>>>> whether it helps finding a solution. I will post an update on the
>>>>> matter
>>>> later.
>>>>>
>>>>> Thanks so far!
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: owner-statalist@hsphsun2.harvard.edu
>>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
>>>>> Sent: Sonntag, 30. September 2012 01:13
>>>>> To: statalist@hsphsun2.harvard.edu
>>>>> Subject: Re: st: Looping within a subset under a certain condition
>>>>>
>>>>> I had another look at this. I still don't understand your problem
>>>>> exactly (e.g. why is the second obs at 5apr2004 considered in
>>>>> window), but the technique here may help.
>>>>>
>>>>> egen first_start = min(wind_start), by(firm_id) egen last_end =
>>>>> max(wind_end), by(firm_id)
>>>>>
>>>>> gen in_window = inrange(date, first_start, last_end)
>>>>>
>>>>> egen all_0_in_window = min(in_window) if rep == 0, by(firm_id)
>>>>>
>>>>> On the last line: on all <=> min, any <=> max, see
>>>>>
>>>>> FAQ     . . Creating variables recording whether any or all possess
> some
>>>>> char.
>>>>>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N.
>>> J.
>>>>> Cox
>>>>>         2/03    How do I create a variable recording whether any
>>>>>                 members of a group (or all members of a group)
>>>>>                 possess some characteristic?
>>>>>                 http://www.stata.com/support/faqs/data/anyall.html
>>>>>
>>>>> Nick
>>>>>
>>>>> On Fri, Sep 28, 2012 at 9:45 PM, Gerard Solbrig
>>>>> <gsolbrig@mail.uni-mannheim.de> wrote:
>>>>>>
>>>>>> I'm encountering a problem for which I seek your help.
>>>>>>
>>>>>> Let me start off with an example from my data (what I want it to
>>>>>> look like in the end), before I explain my particular problem.
>>>>>>
>>>>>> firm_id date            rep     wind_start              wind_end
>>>>>> rep_ins
>>>>>>
>>>>>> firm1           01jan2000       0       .                       .
>>>>>> 0
>>>>>> firm1           05apr2004       0       .                       .
>>>>>> 1
>>>>>> firm1           01nov2004       1       05may2004
>>> 30may2005
>>>>>> .
>>>>>> firm1           10dec2004       0       .                       .
>>>>>> 1
>>>>>> firm1           01jan2006       0       .                       .
>>>>>> 0
>>>>>> firm2           30dec1999       1       03jul1999
>>> 27jul2000
>>>>>> .
>>>>>> firm2           05jan2000       1       09jul1999
>>> 02aug2000
>>>>>> .
>>>>>> firm2           06jun2000       0       .                       .
>>>>>> 1
>>>>>>
>>>>>> Each firm in my data has a 'firm_id'. Variable 'date' refers to an
>>>>>> event date. The 'rep' dummy indicates the type of event.
>>>>>> I set 'wind_start' and 'wind_end' as period around the event
>>>>>> (-180days,+210days), in case it's a rep = 1 type event.
>>>>>>
>>>>>> Now, I would like the 'rep_ins' dummy to indicate (i.e., rep_ins =
>>>>>> 1), whether the date of all other observations of this firm (where
>>>>>> rep =
>>>>>> 0) lies within the range determined by 'wind_start' and 'wind_end'
>>>>>> (which is conditional upon the 'rep' dummy).
>>>>>>
>>>>>> I've come across looping over observations and tried to design a
>>>>>> solution for this problem based on that, but failed to do so. I
>>>>>> assume the solution also depends on sorting the data in a special way.
>>>>>>
>>>>>> Here's the first part of my .do-file:
>>>>>>
>>>>>> gen wind_start = date-180 if rep == 1 gen wind_end = date+210 if
>>>>>> rep == 1 format wind_start %d format wind_end %d gsort +cusip6
>>>>>> +date
>>>>>> +trandate gen rep_ins = 0 if rep != 1
>>>>>>
>>>>>> I tried to come up with a solution by adding variables 'per_start'
>>>>>> and 'per_end' for all rep = 0:
>>>>>>
>>>>>> gen per_start = date-180 if rep == 0 gen per_end = date+180 if rep
>>>>>> == 0 format per_start %d format per_end %d
>>>>>>
>>>>>> To mark the period within which the rep = 1 event can lie. Maybe
>>>>>> this could contribute to finding a solution as well.
>>>>> *
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index