Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Calculate rolling mean using last observation in a group


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Calculate rolling mean using last observation in a group
Date   Tue, 12 Aug 2008 13:51:17 +0100

If I understand your question correctly, it yields to a minor variation
on the technique documented in 

SJ-7-3  pr0033  . . . . . . . . . . . . . .  Stata tip 51: Events in
intervals
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N.
J. Cox
        Q3/07   SJ 7(3):440--443                                 (no
commands)
        tip for counting or summarizing irregularly spaced
        events in intervals

and explained in my posting replying to your previous question: 

<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist
.0808/date/article-20.html> 

local N = _N 
gsort Company_No - Date 
gen avex = . 
qui forval i = 1/`N' { 
	egen tag = tag(Investor_id) ///
	if Company_no == Company_no[`i'] & ///
	inrange(Date, Date[`i'] - 90, Date[`i'])

	su x if tag, meanonly 

	replace avex = r(mean) in `i'

	drop tag 
}

The obvious variations on the previous example are using -summarize,
meanonly- to get the mean and specifying the previous 90 days within an
-inrange()- condition. (This code includes the current date; modify
according to taste.) 

The not-so-obvious variation is using -egen, tag()- to tag the _last_
relevant observation for each distinct Investor_id. 

-egen, tag()- tags just one of any repetitions of each distinct value of
the variable specified. It's not documented, but the observation tagged
is in fact the _first_ such occurrence in the dataset in the current
sort order. Hence use -gsort- first to reverse date order in each panel,
so that tagging catches the last such occurrence in time. 

Nick
n.j.cox@durham.ac.uk 

Marietta Jones

I have a new problem related to the dataset I presented in my original
email (see below).

For each observation in the dataset, I would like to calculate the
mean of variable x for the same company_no on a rolling prior 90 day
basis.
The extra complexity arises by the fact that there may be more than
one observations of variable x for the same investor_id for a given
company_no.
I would only like to include the last observation per investor_id for
a given company_no in the calculation of the mean.

>> I have the following dataset:
>>
>> Company_no     Date          Investor_id
>> 1                     03/01/2000          1
>> 1                     04/09/2000          1
>> 1                     12/12/2001          2
>> 2                     13/12/2000          4
>> 2                     07/08/2001          7
>> 3                     09/08/2000          4
>> 3                     19/03/2001          4
>> 3                     02/05/2001          5
>> 3                     03/12/2001          6
>>
>>
>> For each observation, I would like to calculate the number of
distinct
>> investors owning shares in a given company in the previous 12 months.
>> The new variable (No_investors_past_12_months) will look as follows:
>>
>> Company_no       Date        Investor_id
No_investors_past_12_months
>> 1                     03/01/2000         1                     1
>> 1                     04/09/2000         1                     1
>> 1                     12/12/2001         2                     1
>> 2                     13/12/2000         4                     1
>> 2                     07/08/2001         7                     2
>> 3                     09/08/2000         4                     1
>> 3                     19/03/2001         4                     1
>> 3                     02/05/2001         5                     2
>> 3                     03/12/2001         6                     3
>>
>> For example, for company 2 on 07/08/2001 there are two investors who
>> have owned shares of this company within the past 12 months.
>>
>> The idea is that the same investor should be counted only once and we
>> should count all observations over the past 12 months, including the
>> current observation date. If there are no observations over the past
>> 12 months, the value of the new variable should equal 1.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index