Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Looping within a subset under a certain condition


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Looping within a subset under a certain condition
Date   Tue, 2 Oct 2012 01:02:27 +0100

I recommend strongly against this approach.

1. No need to unravel or revisit code that was hard work. On the
contrary, you need to build on it.

2. The problem just solved was a bit tricky; this isn't. Nor was what
you just did really much of a precedent for anything else.

Consider

sort cusip6 rep trandate
gen long obs = _n

gen rep_ins = 0
egen firm_numid = group(cusip6)
summarize firm_numid, meanonly
forvalues x = 1/`r(max)' {
         su obs if firm_numid == `x' & rep == 0, meanonly
         local z1 = r(min)
         local z2 = r(max)
         su obs if firm_numid == `x' & rep == 1, meanonly
         local o1 = r(min)
         local o2 = r(max)

         if missing(`z1', `z2', `o1', `o2') continue

         forvalues i = `z1'/`z2' {
                 local isin = 0
                 forvalues o = `o1'/`o2' {
                         if inrange(trandate[`i'], wind_start[`o'],
wind_end[`o']) {

                                local isin = 1
                         }
                 }
                 if `isin' == 1 replace rep_ins = 1 in `i'
         }
 }

What you want now sounds like one line:

bysort firm_id (trandate) : gen sum_sh = sum(shares_dir * (rep_ins == 1))

The expression

shares_dir * (rep_ins == 1)

will be -shares_dir- or 0, depending on whether your indicator is 1 or
0. -sum()- automatically gives running sums; that is not something you
need re-create.

Nick

P.S. Optionally you could go

by firm_id : replace sum_sh = . if rep_ins == 0

Or (in one)

bysort firm_id (rep_ins trandate) : gen sum_sh = sum(shares_dir) if
rep_ins == 1

-by:- is very powerful. For a tutorial, see

SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step by: step
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q1/02   SJ 2(1):86--102                                  (no commands)
        explains the use of the by varlist : construct to tackle
        a variety of problems with group structure, ranging from
        simple calculations for each of several groups to more
        advanced manipulations that use the built-in _n and _N

.pdf is at the Stata Journal website.

Nick



On Mon, Oct 1, 2012 at 11:15 PM, Gerard Solbrig
<gsolbrig@mail.uni-mannheim.de> wrote:
> I'm currently trying to modify the code for the next step of my analysis:
>
> The code is supposed to sum up ("running sum") variable -shares_dir- of
> every rep = 0 observation if its respective -trandate- is in range of
> -wind_start- and -wind_end- of the rep = 1 case the loop currently deals
> with. The result of the running sum is supposed to be stored in variable
> -sum_sh- for every rep = 1 case when the loop is done looking through all
> rep = 0 cases for this rep = 1 case
> .
> Therefore, I tried to modify the code to tackle this problem (see comments
> within code below). I'm not sure about the correct construction of the
> running sum and how/when I need to tell the loop to put the result in
> -sum_sh- when done.
>
> Here's what I came up with so far:
>
> gen sum_sh = .
> gen long obsno = _n
> sort cusip6 rep trandate
> summarize firm_numid, meanonly
> local max = r(max)
> forvalues x = 1/`max' {
>         summarize obsno if firm_numid == `x' & rep == 0, meanonly
>         local z1 = r(min)
>         local z2 = r(max)
>         summarize obsno if firm_numid == `x' & rep == 1, meanonly
>         local o1 = r(min)
>         local o2 = r(max)
>         if missing(`z1',`z2',`o1',`o2') continue
>
>         /* start by entering the rep = 1 cases first, as output should be
> stored in var sum_sh here */
>         quietly forvalues o = `o1'/`o2' {
>
>                 /* slightly changed starting assumption: now every rep = 0
> case assumed to be in a window, unless found otherwise */
>                 local isin = 1
>
>                 /* -noshcum- local macro to store result of running sum,
> zero initially */
>                 local noshcum = 0
>
>                 forvalues i = `z1'/`z2' {
>
>                         /* if Stata finds that rep = 0 -trandate- is not in
> range, just continue with the next observation */
>                         if !inrange(trandate[`i'], wind_start[`o'],
> wind_end[`o']) {
>                         local isin = 0 continue
>                         }
>                         /* now deal with the cases which are in the range */
>                         else {
>                         /* local macro to temporarily store this
> observation's -share_dir- to be included in the running sum */
>                         local nosh = shares_dir[`z']
>
>                         /* update local macro keeping the result of the
> running sum until all rep = 0 cases have been screened */
>                         local shares_cum = `shares_cum' + `nosh'
>                 }
>                 /* this might not be correct: I want the result of the
> running sum to be put in -sum_sh- for each rep = 1 observation */
>                 if `isin' == 1 replace sum_sh = `shares_cum' in `o'
>         }
> }
>
> All useful input much appreciated, as always!
> Best,
> Gerard
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
> Sent: Montag, 1. Oktober 2012 19:15
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Looping within a subset under a certain condition
>
> Good! Thanks for the closure.
>
> Needing to compare all the events of type A _individually_ with all the
> events of type B -- within a larger set of classes C -- can't be that
> unusual.
>
> Nick
>
> On Mon, Oct 1, 2012 at 5:42 PM, Gerard Solbrig
> <gsolbrig@mail.uni-mannheim.de> wrote:
>> Now, the code works the way it is supposed to work! I find the
>> intuition behind the code very appealing!
>>
>> Can't thank you enough for the assistance. Especially you, Nick.
>> Being new to Stata, I learned a lot from this thread and the
> correspondence.
>> Maybe I'll be able to make valuable contributions soon, too.
>>
>> Best,
>> Gerard
>>
>>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
>> Sent: Montag, 1. Oktober 2012 02:38
>> To: statalist@hsphsun2.harvard.edu
>> Subject: Re: st: Looping within a subset under a certain condition
>>
>> So, straight away the first firm has no cases with rep == 1. Nothing
>> doing in those circumstances.
>>
>> Also, you messed with the rest of my code without explaining why.
>>
>> I recommend as follows.  You need to be consistent on -date- and
> -trandate-.
>>
>> sort cusip6 rep date
>> gen long obs = _n
>> gen rep_ins = 0
>> egen firm_numid = group(cusip6)
>> summarize firm_numid, meanonly
>> forvalues x = 1/`r(max)' {
>>          su obs if firm_numid == `x' & rep == 0, meanonly
>>          local z1 = r(min)
>>          local z2 = r(max)
>>          su obs if firm_numid == `x' & rep == 1, meanonly
>>          local o1 = r(min)
>>          local o2 = r(max)
>>
>>          if missing(`z1', `z2', `o1', `o2') continue
>>
>>          forvalues i = `z1'/`z2' {
>>                  local isin = 0
>>                  forvalues o = `o1'/`o2' {
>>                          if inrange(trandate[`i'], wind_start[`o'],
>> wind_end[`o']) {
>>                                 local isin = 1
>>                          }
>>                  }
>>                  if `isin' == 1 replace rep_ins = 1 in `i'
>>          }
>>  }
>>
>> On Sun, Sep 30, 2012 at 9:15 PM, Gerard Solbrig
>> <gsolbrig@mail.uni-mannheim.de> wrote:
>>> Here's what Stata says:
>>>
>>> - forvalues x = 1/`r(max)' {
>>> = forvalues x = 1/18554 {
>>> - summarize obs if firm_numid == `x' & rep == 0, meanonly = summarize
>>> obs if firm_numid == 1 & rep == 0, meanonly
>>> - local z1 = r(min)
>>> - local z2 = r(max)
>>> - summarize obs if firm_numid == `x' & rep == 1, meanonly = summarize
>>> obs if firm_numid == 1 & rep == 1, meanonly
>>> - local o1 = r(min)
>>> - local o2 = r(max)
>>> - forvalues i = `z1'/`z2' {
>>> = forvalues i = 1/106 {
>>> - local isin = 1
>>> - forvalues o = `o1'/`o2' {
>>> = forvalues o = ./. {
>>> invalid syntax
>>>   if inrange(trandate[`i'], wind_start[`o'], wind_end[`o']) {
>>>   local isin = 0
>>>   }
>>>   if `isin' == 1 replace rep_ins = 1 in `i'
>>>   }
>>>   }
>>> r(198);
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: owner-statalist@hsphsun2.harvard.edu
>>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
>>> Sent: Sonntag, 30. September 2012 21:53
>>> To: statalist@hsphsun2.harvard.edu
>>> Subject: Re: st: Looping within a subset under a certain condition
>>>
>>> This code refers to -date- and -trandate- at different places.
>>>
>>> gen long obs = _n
>>>
>>> was recommended earlier.
>>>
>>> Type
>>>
>>> set trace on
>>> set tracedepth 1
>>>
>>> before running the code and see which line produces the error.
>>>
>>> On Sun, Sep 30, 2012 at 7:28 PM, Gerard Solbrig
>>> <gsolbrig@mail.uni-mannheim.de> wrote:
>>>> I'm sorry, but I've been trying for hours now: Stata yields me
>>>> "invalid syntax r(198);" every time I try to run this code:
>>>>
>>>> sort cusip6 rep date
>>>> gen obs = _n
>>>> gen rep_ins = 0
>>>> egen firm_numid = group(cusip6)
>>>> summarize firm_numid, meanonly
>>>> forvalues x = 1/`r(max)' {
>>>>         su obs if firm_numid == `x' & rep == 0, meanonly
>>>>         local z1 = r(min)
>>>>         local z2 = r(max)
>>>>         su obs if firm_numid == `x' & rep == 1, meanonly
>>>>         local o1 = r(min)
>>>>         local o2 = r(max)
>>>>         forvalues i = `z1'/`z2' {
>>>>                 local isin = 1
>>>>                 forvalues o = `o1'/`o2' {
>>>>                         if inrange(trandate[`i'], wind_start[`o'],
>>>> wind_end[`o']) {
>>>>                         local isin = 0
>>>>                         }
>>>>                 if `isin' == 1 replace rep_ins = 1 in `i'
>>>>                 }
>>>>         }
>>>> }
>>>>
>>>> Despite countless tries and modifications, I cannot find the mistake
>>>> in the syntax. I simply don't know what is supposed to be wrong here.
>>>> I know this code should be working the way I need it...
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index