Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

FW: st: Looping within a subset under a certain condition


From   "Gerard Solbrig" <gsolbrig@mail.uni-mannheim.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   FW: st: Looping within a subset under a certain condition
Date   Sun, 30 Sep 2012 14:09:24 +0200

Ok, after some testing: my code does not work correctly at all, as it only
sets rep_ins = 1 for trandates after the rep = 1 date.


-----Original Message-----
From: Gerard Solbrig [mailto:gsolbrig@mail.uni-mannheim.de] 
Sent: Sonntag, 30. September 2012 13:21
To: 'statalist@hsphsun2.harvard.edu'
Subject: RE: st: Looping within a subset under a certain condition

I have tested your suggested code and adapted it a little bit to this:

sort firm_id rep trandate
gen long obs = _n
gen rep_ins = 1
egen firm_numid = group(firm_id)
summarize firm_numid, meanonly
forval f =  1/`r(max)' {
	summarize obs if firm_numid == `f' & rep == 0, meanonly 
	local z1 = r(min) 
	local z2 = r(max)
	summarize obs if firm_numid == `f' & rep == 1, meanonly 
	local o1 = r(min) 
	local o2 = r(max)
	forval i = `z1'/`z2' {
		local allin = 1
		forval o = `o1'/`o2' {
			if !inrange(trandate[`i'], wind_start[`o'],
wind_end[`o']) {
			local allin = 0                                
			}
		}
	if `allin' == 0 replace rep_ins = 0 in `i'
}
}
replace rep_ins = . if rep == 1

Somehow, the code changes all -rep_ins- to 0, even for observations which
clearly are in the range -wind_start- -wind_end- of a rep == 1 case. I am
not sure why it does that, as I totally understand your intuition behind
your code.

I managed to get my code to work, which looks like this as of now:

gsort +firm_id -rep +date
by firm_id: gen obs = _n
gen group_obs = _n
qui bysort firm_id: gen obs_N = _N
qui bysort firm_id (group_obs): replace group_obs = group_obs[1] by
group_obs, sort: gen group = _n == 1 replace group = sum(group) summarize
group, meanonly local max = r(max) gsort +firm_id -rep +date forvalues x =
1/`max' {
	summarize obs, meanonly
	local N = r(N)
	forvalues i = 1/`N' {
		if rep[`i'] == 1 {
		local r = `i'
		local s = `i'+1
		forvalues z = `s'/`N' {
			if trandate[`z'] >= wind_start[`r'] & trandate[`z']
<= wind_end[`r'] {
			replace rep_ins = 1 in `z'
			}
			else {
			replace rep_ins = 0 in `z'
			}
		}
	}
}
}
replace rep_ins = . if rep == 1

However, your code seems to work faster and is more intuitive than what I
came up with here. Any idea on what to tweak on the top code to make it
work?

In addition, a more trivial question: how can I stop Stata from showing me
'1 real change made' for every change? Would a simple -quietly- command put
before the -forval- loop prevent it from doing that?

Many thanks so far! After all, I'm a Stata-newbie and I appreciate your
patience and helpfulness a lot!
Gerard



-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Sonntag, 30. September 2012 12:48
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Looping within a subset under a certain condition

Should be

sort firm rep trandate

Sorry!

On Sun, Sep 30, 2012 at 11:27 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> You are not showing me the complete line you typed, so I can't tell 
> you what was wrong exactly.
>
> More positively, here is a stab at your problem, but I haven't tested the
code.
>
> sort firm trandate rep
>
> gen long obsno = _n
>
> * assume all are in some window; will change our mind if we find an 
> exception gen all_in_a_window = 1
>
> * numeric ids 1 2 3 ... are just a convenience for looping egen 
> firm_numid = group(firm_id) su firm_numid, meanonly
>
> * loop over firms
> forval f =  1/`r(max)' {
>
> * within each firm, which cases have rep == 0 su obsno if firm_numid 
> == `f' & rep == 0, meanonly local z1 = r(min) local z2 = r(max)
>
> * ditto, rep == 1
> su obsno if firm_numid == `f' & rep == 1, meanonly local o1 = r(min) 
> local o2 = r(max)
>
> * look at each case of rep == 0
> forval i = `z1'/`z2' {
>         local allin = 1
>
>                 * we use the -trandate[`i'] and compare it with the 
> windows for each case of rep == 1
>                 * note the crucial !    [!!!]
>         forval o = `o1'/`o2' {
>                 if !inrange[trandate[`i'], win_start[`o'], win_end[`o']) {
>                         local allin = 0
>                                 }
>         }
>
>         if `allin' == 0 replace all_in_window = 0 in `i'
> }
>
> }
>
> Nick
>
> On Sun, Sep 30, 2012 at 11:17 AM, Gerard Solbrig 
> <gsolbrig@mail.uni-mannheim.de> wrote:
>> I understand. That's what I did in an earlier version of the loop, 
>> where I subscripted both, -rep- and -trandate- in my loop, but then Stata
returned:
>>
>> '[' invalid obs no
>> r(198);
>>
>> Why is that? That's why I got rid of it in the first place. But 
>> without the subscript, the loop does not seem to finish running.
>>
>>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
>> Sent: Sonntag, 30. September 2012 11:59
>> To: statalist@hsphsun2.harvard.edu
>> Subject: Re: st: Looping within a subset under a certain condition
>>
>> This can't be right, if only because you are misunderstanding what 
>> the
>> -if- command does. Stata treats
>>
>> if rep == 1
>>
>> as if it were
>>
>> if rep[1] == 1
>>
>> See
>>
>> FAQ     . . . . . . . . . . . . . . . . . . . . .  if command vs. if
>> qualifier
>>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  J.
>> Wernow
>>         6/00    I have an if command in my program that only seems
>>                 to evaluate the first observation, what's going on?
>>                 
>> http://www.stata.com/support/faqs/lang/ifqualifier.html
>>
>> The context of looping over observations makes no difference here. 
>> You probably intend
>>
>> if rep[`i'] == 1
>>
>> Similar comment w.r.t.
>>
>> if trandate ...
>>
>> where -trandate- _must_ be subscripted.
>>
>>
>> On Sun, Sep 30, 2012 at 10:18 AM, Gerard Solbrig 
>> <gsolbrig@mail.uni-mannheim.de> wrote:
>>> That sure is correct. Please see my reply to Pengpeng on that matter.
>>> So far, I've only focused on getting the rep_ins indicator to work 
>>> at all, but multiple windows for one firm is an additional concern.
>>> Ideally, a code would indicate for each rep = 0 case within which of 
>>> these windows the observation's 'trandate' lies...
>>>
>>> Here's the last version of my code (without inclusion of your 
>>> earlier suggestion and the multiple window problem):
>>>
>>> forvalues x = 1/`max' {
>>>         summarize obs, meanonly
>>>         local N = r(N)
>>>         forvalues i = 1/`N' {
>>>                 if rep == 1 {
>>>                 local r = `i'
>>>                 local s = `i'+1
>>>                 forvalues z = `s'/`N' {
>>>                         if trandate >= wind_start[`r'] & trandate <= 
>>> wind_end[`r'] {
>>>                         replace rep_ins = 1 in [`z']
>>>                         }
>>>                         else {
>>>                         replace rep_ins = 0 in [`z']
>>>                         }
>>>                 }
>>>         }
>>> }
>>> }
>>> replace rep_ins = . if rep == 1
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: owner-statalist@hsphsun2.harvard.edu
>>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
>>> Sent: Sonntag, 30. September 2012 11:10
>>> To: statalist@hsphsun2.harvard.edu
>>> Subject: Re: st: Looping within a subset under a certain condition
>>>
>>> The other thing I wasn't clear on your rules for combining two or 
>>> more windows for the same firm. The code example I gave just uses 
>>> the overall range of the windows, but that would include any gaps 
>>> between windows. Thus if a < b < c < d and there are windows [a,b] 
>>> and [c,d] then the combined window [a, d] includes a gap [b, c].
>>>
>>> On Sun, Sep 30, 2012 at 9:56 AM, Gerard Solbrig 
>>> <gsolbrig@mail.uni-mannheim.de> wrote:
>>>> My bad, sorry! Of course, the observation 5apr2004 should not be 
>>>> considered in the window, as it lies outside of the range between 
>>>> 'wind_start' and 'wind_end'. Despite, it seems you've understood my
>>> problem correctly.
>>>>
>>>> I'll try to incorporate your suggestion into a solution and see 
>>>> whether it helps finding a solution. I will post an update on the 
>>>> matter
>>> later.
>>>>
>>>> Thanks so far!
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: owner-statalist@hsphsun2.harvard.edu
>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
>>>> Sent: Sonntag, 30. September 2012 01:13
>>>> To: statalist@hsphsun2.harvard.edu
>>>> Subject: Re: st: Looping within a subset under a certain condition
>>>>
>>>> I had another look at this. I still don't understand your problem 
>>>> exactly (e.g. why is the second obs at 5apr2004 considered in 
>>>> window), but the technique here may help.
>>>>
>>>> egen first_start = min(wind_start), by(firm_id) egen last_end = 
>>>> max(wind_end), by(firm_id)
>>>>
>>>> gen in_window = inrange(date, first_start, last_end)
>>>>
>>>> egen all_0_in_window = min(in_window) if rep == 0, by(firm_id)
>>>>
>>>> On the last line: on all <=> min, any <=> max, see
>>>>
>>>> FAQ     . . Creating variables recording whether any or all possess
some
>>>> char.
>>>>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N.
>> J.
>>>> Cox
>>>>         2/03    How do I create a variable recording whether any
>>>>                 members of a group (or all members of a group)
>>>>                 possess some characteristic?
>>>>                 http://www.stata.com/support/faqs/data/anyall.html
>>>>
>>>> Nick
>>>>
>>>> On Fri, Sep 28, 2012 at 9:45 PM, Gerard Solbrig 
>>>> <gsolbrig@mail.uni-mannheim.de> wrote:
>>>>>
>>>>> I'm encountering a problem for which I seek your help.
>>>>>
>>>>> Let me start off with an example from my data (what I want it to 
>>>>> look like in the end), before I explain my particular problem.
>>>>>
>>>>> firm_id date            rep     wind_start              wind_end
>>>>> rep_ins
>>>>>
>>>>> firm1           01jan2000       0       .                       .
>>>>> 0
>>>>> firm1           05apr2004       0       .                       .
>>>>> 1
>>>>> firm1           01nov2004       1       05may2004
>> 30may2005
>>>>> .
>>>>> firm1           10dec2004       0       .                       .
>>>>> 1
>>>>> firm1           01jan2006       0       .                       .
>>>>> 0
>>>>> firm2           30dec1999       1       03jul1999
>> 27jul2000
>>>>> .
>>>>> firm2           05jan2000       1       09jul1999
>> 02aug2000
>>>>> .
>>>>> firm2           06jun2000       0       .                       .
>>>>> 1
>>>>>
>>>>> Each firm in my data has a 'firm_id'. Variable 'date' refers to an 
>>>>> event date. The 'rep' dummy indicates the type of event.
>>>>> I set 'wind_start' and 'wind_end' as period around the event 
>>>>> (-180days,+210days), in case it's a rep = 1 type event.
>>>>>
>>>>> Now, I would like the 'rep_ins' dummy to indicate (i.e., rep_ins = 
>>>>> 1), whether the date of all other observations of this firm (where 
>>>>> rep =
>>>>> 0) lies within the range determined by 'wind_start' and 'wind_end'
>>>>> (which is conditional upon the 'rep' dummy).
>>>>>
>>>>> I've come across looping over observations and tried to design a 
>>>>> solution for this problem based on that, but failed to do so. I 
>>>>> assume the solution also depends on sorting the data in a special way.
>>>>>
>>>>> Here's the first part of my .do-file:
>>>>>
>>>>> gen wind_start = date-180 if rep == 1 gen wind_end = date+210 if 
>>>>> rep == 1 format wind_start %d format wind_end %d gsort +cusip6
>>>>> +date
>>>>> +trandate gen rep_ins = 0 if rep != 1
>>>>>
>>>>> I tried to come up with a solution by adding variables 'per_start'
>>>>> and 'per_end' for all rep = 0:
>>>>>
>>>>> gen per_start = date-180 if rep == 0 gen per_end = date+180 if rep 
>>>>> == 0 format per_start %d format per_end %d
>>>>>
>>>>> To mark the period within which the rep = 1 event can lie. Maybe 
>>>>> this could contribute to finding a solution as well.
>>>> *
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index