Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Gerard Solbrig" <gsolbrig@mail.uni-mannheim.de> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | FW: st: Looping within a subset under a certain condition |
Date | Sun, 30 Sep 2012 14:09:24 +0200 |
Ok, after some testing: my code does not work correctly at all, as it only sets rep_ins = 1 for trandates after the rep = 1 date. -----Original Message----- From: Gerard Solbrig [mailto:gsolbrig@mail.uni-mannheim.de] Sent: Sonntag, 30. September 2012 13:21 To: 'statalist@hsphsun2.harvard.edu' Subject: RE: st: Looping within a subset under a certain condition I have tested your suggested code and adapted it a little bit to this: sort firm_id rep trandate gen long obs = _n gen rep_ins = 1 egen firm_numid = group(firm_id) summarize firm_numid, meanonly forval f = 1/`r(max)' { summarize obs if firm_numid == `f' & rep == 0, meanonly local z1 = r(min) local z2 = r(max) summarize obs if firm_numid == `f' & rep == 1, meanonly local o1 = r(min) local o2 = r(max) forval i = `z1'/`z2' { local allin = 1 forval o = `o1'/`o2' { if !inrange(trandate[`i'], wind_start[`o'], wind_end[`o']) { local allin = 0 } } if `allin' == 0 replace rep_ins = 0 in `i' } } replace rep_ins = . if rep == 1 Somehow, the code changes all -rep_ins- to 0, even for observations which clearly are in the range -wind_start- -wind_end- of a rep == 1 case. I am not sure why it does that, as I totally understand your intuition behind your code. I managed to get my code to work, which looks like this as of now: gsort +firm_id -rep +date by firm_id: gen obs = _n gen group_obs = _n qui bysort firm_id: gen obs_N = _N qui bysort firm_id (group_obs): replace group_obs = group_obs[1] by group_obs, sort: gen group = _n == 1 replace group = sum(group) summarize group, meanonly local max = r(max) gsort +firm_id -rep +date forvalues x = 1/`max' { summarize obs, meanonly local N = r(N) forvalues i = 1/`N' { if rep[`i'] == 1 { local r = `i' local s = `i'+1 forvalues z = `s'/`N' { if trandate[`z'] >= wind_start[`r'] & trandate[`z'] <= wind_end[`r'] { replace rep_ins = 1 in `z' } else { replace rep_ins = 0 in `z' } } } } } replace rep_ins = . if rep == 1 However, your code seems to work faster and is more intuitive than what I came up with here. Any idea on what to tweak on the top code to make it work? In addition, a more trivial question: how can I stop Stata from showing me '1 real change made' for every change? Would a simple -quietly- command put before the -forval- loop prevent it from doing that? Many thanks so far! After all, I'm a Stata-newbie and I appreciate your patience and helpfulness a lot! Gerard -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Sonntag, 30. September 2012 12:48 To: statalist@hsphsun2.harvard.edu Subject: Re: st: Looping within a subset under a certain condition Should be sort firm rep trandate Sorry! On Sun, Sep 30, 2012 at 11:27 AM, Nick Cox <njcoxstata@gmail.com> wrote: > You are not showing me the complete line you typed, so I can't tell > you what was wrong exactly. > > More positively, here is a stab at your problem, but I haven't tested the code. > > sort firm trandate rep > > gen long obsno = _n > > * assume all are in some window; will change our mind if we find an > exception gen all_in_a_window = 1 > > * numeric ids 1 2 3 ... are just a convenience for looping egen > firm_numid = group(firm_id) su firm_numid, meanonly > > * loop over firms > forval f = 1/`r(max)' { > > * within each firm, which cases have rep == 0 su obsno if firm_numid > == `f' & rep == 0, meanonly local z1 = r(min) local z2 = r(max) > > * ditto, rep == 1 > su obsno if firm_numid == `f' & rep == 1, meanonly local o1 = r(min) > local o2 = r(max) > > * look at each case of rep == 0 > forval i = `z1'/`z2' { > local allin = 1 > > * we use the -trandate[`i'] and compare it with the > windows for each case of rep == 1 > * note the crucial ! [!!!] > forval o = `o1'/`o2' { > if !inrange[trandate[`i'], win_start[`o'], win_end[`o']) { > local allin = 0 > } > } > > if `allin' == 0 replace all_in_window = 0 in `i' > } > > } > > Nick > > On Sun, Sep 30, 2012 at 11:17 AM, Gerard Solbrig > <gsolbrig@mail.uni-mannheim.de> wrote: >> I understand. That's what I did in an earlier version of the loop, >> where I subscripted both, -rep- and -trandate- in my loop, but then Stata returned: >> >> '[' invalid obs no >> r(198); >> >> Why is that? That's why I got rid of it in the first place. But >> without the subscript, the loop does not seem to finish running. >> >> >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox >> Sent: Sonntag, 30. September 2012 11:59 >> To: statalist@hsphsun2.harvard.edu >> Subject: Re: st: Looping within a subset under a certain condition >> >> This can't be right, if only because you are misunderstanding what >> the >> -if- command does. Stata treats >> >> if rep == 1 >> >> as if it were >> >> if rep[1] == 1 >> >> See >> >> FAQ . . . . . . . . . . . . . . . . . . . . . if command vs. if >> qualifier >> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . J. >> Wernow >> 6/00 I have an if command in my program that only seems >> to evaluate the first observation, what's going on? >> >> http://www.stata.com/support/faqs/lang/ifqualifier.html >> >> The context of looping over observations makes no difference here. >> You probably intend >> >> if rep[`i'] == 1 >> >> Similar comment w.r.t. >> >> if trandate ... >> >> where -trandate- _must_ be subscripted. >> >> >> On Sun, Sep 30, 2012 at 10:18 AM, Gerard Solbrig >> <gsolbrig@mail.uni-mannheim.de> wrote: >>> That sure is correct. Please see my reply to Pengpeng on that matter. >>> So far, I've only focused on getting the rep_ins indicator to work >>> at all, but multiple windows for one firm is an additional concern. >>> Ideally, a code would indicate for each rep = 0 case within which of >>> these windows the observation's 'trandate' lies... >>> >>> Here's the last version of my code (without inclusion of your >>> earlier suggestion and the multiple window problem): >>> >>> forvalues x = 1/`max' { >>> summarize obs, meanonly >>> local N = r(N) >>> forvalues i = 1/`N' { >>> if rep == 1 { >>> local r = `i' >>> local s = `i'+1 >>> forvalues z = `s'/`N' { >>> if trandate >= wind_start[`r'] & trandate <= >>> wind_end[`r'] { >>> replace rep_ins = 1 in [`z'] >>> } >>> else { >>> replace rep_ins = 0 in [`z'] >>> } >>> } >>> } >>> } >>> } >>> replace rep_ins = . if rep == 1 >>> >>> >>> >>> -----Original Message----- >>> From: owner-statalist@hsphsun2.harvard.edu >>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox >>> Sent: Sonntag, 30. September 2012 11:10 >>> To: statalist@hsphsun2.harvard.edu >>> Subject: Re: st: Looping within a subset under a certain condition >>> >>> The other thing I wasn't clear on your rules for combining two or >>> more windows for the same firm. The code example I gave just uses >>> the overall range of the windows, but that would include any gaps >>> between windows. Thus if a < b < c < d and there are windows [a,b] >>> and [c,d] then the combined window [a, d] includes a gap [b, c]. >>> >>> On Sun, Sep 30, 2012 at 9:56 AM, Gerard Solbrig >>> <gsolbrig@mail.uni-mannheim.de> wrote: >>>> My bad, sorry! Of course, the observation 5apr2004 should not be >>>> considered in the window, as it lies outside of the range between >>>> 'wind_start' and 'wind_end'. Despite, it seems you've understood my >>> problem correctly. >>>> >>>> I'll try to incorporate your suggestion into a solution and see >>>> whether it helps finding a solution. I will post an update on the >>>> matter >>> later. >>>> >>>> Thanks so far! >>>> >>>> >>>> -----Original Message----- >>>> From: owner-statalist@hsphsun2.harvard.edu >>>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox >>>> Sent: Sonntag, 30. September 2012 01:13 >>>> To: statalist@hsphsun2.harvard.edu >>>> Subject: Re: st: Looping within a subset under a certain condition >>>> >>>> I had another look at this. I still don't understand your problem >>>> exactly (e.g. why is the second obs at 5apr2004 considered in >>>> window), but the technique here may help. >>>> >>>> egen first_start = min(wind_start), by(firm_id) egen last_end = >>>> max(wind_end), by(firm_id) >>>> >>>> gen in_window = inrange(date, first_start, last_end) >>>> >>>> egen all_0_in_window = min(in_window) if rep == 0, by(firm_id) >>>> >>>> On the last line: on all <=> min, any <=> max, see >>>> >>>> FAQ . . Creating variables recording whether any or all possess some >>>> char. >>>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. >> J. >>>> Cox >>>> 2/03 How do I create a variable recording whether any >>>> members of a group (or all members of a group) >>>> possess some characteristic? >>>> http://www.stata.com/support/faqs/data/anyall.html >>>> >>>> Nick >>>> >>>> On Fri, Sep 28, 2012 at 9:45 PM, Gerard Solbrig >>>> <gsolbrig@mail.uni-mannheim.de> wrote: >>>>> >>>>> I'm encountering a problem for which I seek your help. >>>>> >>>>> Let me start off with an example from my data (what I want it to >>>>> look like in the end), before I explain my particular problem. >>>>> >>>>> firm_id date rep wind_start wind_end >>>>> rep_ins >>>>> >>>>> firm1 01jan2000 0 . . >>>>> 0 >>>>> firm1 05apr2004 0 . . >>>>> 1 >>>>> firm1 01nov2004 1 05may2004 >> 30may2005 >>>>> . >>>>> firm1 10dec2004 0 . . >>>>> 1 >>>>> firm1 01jan2006 0 . . >>>>> 0 >>>>> firm2 30dec1999 1 03jul1999 >> 27jul2000 >>>>> . >>>>> firm2 05jan2000 1 09jul1999 >> 02aug2000 >>>>> . >>>>> firm2 06jun2000 0 . . >>>>> 1 >>>>> >>>>> Each firm in my data has a 'firm_id'. Variable 'date' refers to an >>>>> event date. The 'rep' dummy indicates the type of event. >>>>> I set 'wind_start' and 'wind_end' as period around the event >>>>> (-180days,+210days), in case it's a rep = 1 type event. >>>>> >>>>> Now, I would like the 'rep_ins' dummy to indicate (i.e., rep_ins = >>>>> 1), whether the date of all other observations of this firm (where >>>>> rep = >>>>> 0) lies within the range determined by 'wind_start' and 'wind_end' >>>>> (which is conditional upon the 'rep' dummy). >>>>> >>>>> I've come across looping over observations and tried to design a >>>>> solution for this problem based on that, but failed to do so. I >>>>> assume the solution also depends on sorting the data in a special way. >>>>> >>>>> Here's the first part of my .do-file: >>>>> >>>>> gen wind_start = date-180 if rep == 1 gen wind_end = date+210 if >>>>> rep == 1 format wind_start %d format wind_end %d gsort +cusip6 >>>>> +date >>>>> +trandate gen rep_ins = 0 if rep != 1 >>>>> >>>>> I tried to come up with a solution by adding variables 'per_start' >>>>> and 'per_end' for all rep = 0: >>>>> >>>>> gen per_start = date-180 if rep == 0 gen per_end = date+180 if rep >>>>> == 0 format per_start %d format per_end %d >>>>> >>>>> To mark the period within which the rep = 1 event can lie. Maybe >>>>> this could contribute to finding a solution as well. >>>> * * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/