# Re: st: RE: Group building according to given report dates and company

 From "Hua Pan" To statalist@hsphsun2.harvard.edu Subject Re: st: RE: Group building according to given report dates and company Date Sat, 21 Mar 2009 20:19:30 +0100

```Dear Michael,Dear Statalisters

Thank you very much indeed for your quick response and wonderful answer, Michael. Your second code is exactly what I basically want. It works perfect when the days between two report dates are fix. You really helped me a lot.

Just one more question about modification of the code. I thought about it again and again since last night but still can’t find the key. That's also why I’m not able to write you the mail earlier, sorry.

At first I want to describe dataset B with more details.  In B there are at most 11 report dates for each company, during the period, form April.2004 to Oct. 2006, the time between one report date and the date next to it should be about 3 month, but it really varies for each observation. I don’t need to ensure that each company has total 11 periods, because the observation is each company-quarter. So the problem is, some companies have missing report dates, and in the middle of the period, e.g. the report date for Jan2005,Apr2006, Jul2006,oct2006 are not available for following sample :

rep1     rep2     rep3   rep4    rep5   rep6    rep7   rep8 rep9rep10rep11
15apr04	20jul04	14oct04	14apr05 26jul05	27oc05 09feb06    .     .    .	.
My question is,
1)	when I use this loop
forvalues i = 1/11 {
replace period = `i' if date >= rep`i'+2 & date < rep`i' + 90
}
It works even when there are missing values. The only problem is, -60- is not correct, it varies each time. I wish to calculate the cumulative return before the next report day. How can I solve this problem?

2)When I use this loop:
forvalues i = 1/11 {
replace period = `i' if date >= rep`i'+2 & date < rep`++i'
}
I can calculate the cumulative return before the next report date. But there are two problems, an error appears because of `++i’:  i=12 can’t be found. And it works only when there are complete 11 periods without missing value. When there missing values, periode 3 would be calculated from 16oct04 to 13apr05 (about 6 months), period 7 from 11feb06 to 30dec06

I tried something like:
forvalues i = 1/11 {
replace period = `i' if date >= rep`i'+2 & date < rep`++i' & rep`++i' <rep`i'+120
}

I set 120, because I just want to sure that time space is about 3 months (90 days), not 6 months (180 days)or more. But it failed to work. The 3.Period is still calculated from 18oct2004 to 13apr2005, almost 6 months. I have really no idea why it happend. Do you have a better idea, so that error wouldn’t appear and it works well even when there are missing data? Any suggestion would be great appreciated. Thank you again, Michael.

Best Regards

Hua

-------- Original-Nachricht --------
> Datum: Fri, 20 Mar 2009 15:18:50 -0400
> Von: "Michael I. Lichter" <mlichter@buffalo.edu>
> An: statalist@hsphsun2.harvard.edu
> Betreff: Re: st: RE: Group building according to given report dates and company

> Hua,
>
> One of your problems is that each ID (permno) has multiple cases in each
> dataset. I don't know exactly how Stata handles a merge when there are
> multiple observations per ID in both datasets, but it apparently doesn't
> do what you want.
>
> How to proceed depends in part on whether there is a fixed set of report
> dates for all IDs. That is, the first report is on 1/1/04, the second on
> 2/1/04, etc., for all firms.  If so, you can do what Nick suggested
> without merging B into A.
>
> -----
> use A, clear
> gen period = 1 if date >= d(01jan2004) & date < d(01feb2004)
> gen period = 2 if date >= d(01feb2004) & date < d(28feb2004)
> ...
> collapse (sum) ret, by(permno period)
> ----
>
> You can generate the periods in a loop if the reporting periods are a
> fixed amount of time apart (30 days, 1 month, 1 quarter, etc.).
>
> If the report dates vary across IDs, then you might consider something
> like this:
>
> -----
> use B,clear
> by permno, sort: gen per = _n   // needed for reshape
> reshape wide retdats, i(permno) j(per)
> sort permno
> merge permno using A
> assert _merge == 3   // just to be sure
> gen byte period = 0
> forvalues i = 1/11 {   // you said there were 11 periods
>     replace period = `i' if date >= retdats`i' & date < retdats`i' +
> 30   // you can fix the dates as needed
> }
> assert period > 0 & period < .   // just to be sure
> collapse (sum) ret, by(permno period)
> -----
>
> I haven't run this code, but I think it's basically what you want and
> mostly syntactically correct. You probably also want to add a check to
> ensure that you have all 11 (or whatever the number is) dates for each
> ID and modify the program accordingly if that assumption is incorrect.
>
> Michael
>
> Hua Pan wrote:
> > Dear Nick,
> >
> > thank you very much for your suggestion. Just as you adviced, I also
> want to make a subdivision into time periods, not smaller datasets. But I just
> wonder how can I do that.Maybe you have an good idea?
> > My dataset after merge A and B look like this:
> >
> > permno	date	         repdats	    ret
> > 10002	04012004	04292004	.018296152
> > 10002	04022004	07272004	.010668194
> > 10002	04052004	10272004	-.000555568
> > 10002	04062004	01242005	-.018343519
> > 10002	04072004	04222005	.018686293
> > 10002	04082004	07202005	-.016120011
> > 10002	04122004	10122005	-.028248586
> > 10002	04132004	01202006	-.012790768
> > 10002	04142004	04212006	-.007656017
> > 10002	04152004	07212006	-.001780456
> > 10002	04162004	10202006	.029131973
> > -----------------------------------------------------
> > 10002	04192004	10202006	-.030040355
> > 10002	04202004	10202006	.000595493
> > 10002	04212004	10202006	.006547655
> > ...     ....               ...
> > 10002   12312006        10202006
> >
> > It looks very strange, cause I have only 11 report dates, but around 700
> dates of return for one permno. So the last repdats 10202006 repeats for
> the rest of observation.
> >
> > Best Regards
> >
> > Hua
> >
> >
> >
> >
> > -------- Original-Nachricht --------
> >
> >> Datum: Fri, 20 Mar 2009 15:13:33 -0000
> >> Von: "Nick Cox" <n.j.cox@durham.ac.uk>
> >> An: statalist@hsphsun2.harvard.edu
> >> Betreff: st: RE: Group building according to given report dates and
> company
> >>
> >
> >
> >> There is a lot of detail that you are expecting members to upload here.
> I
> >> have not tried, but I have some broad strategic advice, exactly as
> before.
> >>
> >> There is no _obvious_ need here for subdividing into smaller datasets
> >> followed by recombination.
> >>
> >> What is most evident is that you have a subdivision into time periods.
> So,
> >> create a variable that is 1, 2, etc. for your distinct time periods and
> >> then use that new variable -by:- as in previous problems.
> >>
> >> One of the easy details about -by:- is that you can define blocks of
> >> observations on as many criteria as you wish.
> >>
> >> Nick
> >> n.j.cox@durham.ac.uk
> >>
> >> Hua Pan
> >>
> >> Now I have a big problem and have thought about it over and over again
> and
> >> can’t find the way to solve it. Maybe someone here has can help me.
> >>
> >> I have two datasets, say A and B. A has permno (identify nr.), daily
> >> return and date. B has permno, repdate (report date).
> >> A
> >> permno	   date             ret
> >> 10001	01.Apr.2004	 .01793105
> >> 10001	02.Apr.2004	-.046070479
> >> 10001	05.Apr.2004	 .022727251
> >> 10001	06.Apr.2004	-.005555551
> >> 10001	07.Apr.2004	 .016759828
> >> …    	…		    …
> >> 10001   31.Dec.2008         …
> >>
> >> 10002	01.Apr.2004	.018296152
> >> 10002  	02.Apr.2004	.010668194
> >> …	     …              …
> >> 10002	31.Dec.2008	     …
> >>
> >> 10003   01.Apr.2004          …
> >> …	    …		      …
> >> 93105       …                 …
> >>
> >> B:
> >> Permno    repdate
> >> 10001  	  21.Apr.2004
> >> 10001     20.July.2004
> >> 10001     26.Oct.2004
> >> 10001     03.Feb.2005
> >>           04.May.2005
> >> …             …
> >> 10002     23.Apr.2004
> >> 10002     22.July.2004
> >> …              …
> >> 93105         …
> >> …
> >> I wish to calculate cumulative daily return for each company and for
> each
> >> period, from the next two days after the report day to the day before
> next
> >> report day e.g. for permno 10001, I want to calculate the cumulative
> return
> >>  for periods: (21.Apr.2004 +2= 23.Apr.2004, 20.July.2004
> -1=19.Juli.2004),
> >> (22.July.2004, 25.Oct.2004), (28.Oct.2004, 02.Feb.2005)……for permno
> >> 10002, from 23.Apr.2004+2+1=26.Apr.2004 (because 25.Apr.2004 is Sunday,
> so I
> >> have to plus one day), to 21.July.2004……
> >> (cumulative return for each day within period are to be calculated,
> almost
> >> 60 for each period)
> >>
> >> At first I tried to get dataset A and B together with merge, but not
> >> successful.
> >> use "C:\A.dta", clear
> >> sort permno date
> >> save "C:\A.dta", replace
> >> use "C:\B.dta", clear
> >> sort permno repdate
> >> merge permno using "C:\A.dta"
> >> keep if _merge == 3
> >> drop _merge
> >> save "C:\AB.dta”
> >>
> >> Many of date and returns of A are deleted. Maybe I should try joinby,
> or
> >> just change A as Master Data? How can I put them orderly together?
> >>
> >> Then I wish that I can build several groups according to the permno and
> >> report date:
> >> group1: permno 10001 with daily return from 23.Apr.2004 to 19.Juli.2004
> >> group2: permno 10001 with daily return from 22.July.2004 to 25.Oct.2004
> >> group3: permno 10001 with daily return from 28.Oct.2004 to 02.Feb.2005
> >> ….
> >> group n: permno 10002 with daily return from 26.Apr.2004 to
> 21.July.2004
> >> …..
> >> group m: permno 93105 with…..
> >>
> >> After I drop the Observations that don’t belong to any group, I can
> >> calculate cumulative return for each group, (thank Austin again for
> this code)
> >>
> >> bys group: g cumul=1+ret if _n==1
> >> replace cumul=(1+ret)*l.cumul if mi(cumul)
> >> replace cumul=cumul-1
> >>
> >> Finally I drop the group with missing value and calculate the mean
> >> cumulative return for those that have the same group number
> >> by group, sort: generate nr=_n
> >> by nr, sort: egen mean_cumul=mean(cumul)
> >>
> >> So the question is how can I put two dataset together and then put the
> >> observations into groups according to permno and report date and pay
> attention
> >> that the period begins and ends with work day. When I solve this
> problem,
> >> then I can calculate the mean cumulative return.
> >> Any suggestion would be greatly appreciated. Thank you very much.
> >>
> >> *
> >> *   For searches and help try:
> >> *   http://www.stata.com/help.cgi?search
> >> *   http://www.stata.com/support/statalist/faq
> >> *   http://www.ats.ucla.edu/stat/stata/
> >>
> >
> >
>
> --
> Michael I. Lichter, Ph.D.
> Research Assistant Professor & NRSA Fellow
> UB Department of Family Medicine / Primary Care Research Institute
> UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
> Office: CC 125 / Phone: 716-898-4751 / E-Mail: mlichter@buffalo.edu
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

--
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```