Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Calculate rolling mean using last observation in a group


From   "Marietta Jones" <marjones75@googlemail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Calculate rolling mean using last observation in a group
Date   Tue, 12 Aug 2008 15:23:16 +0100

Hi Nick,

Thank you very much for the suggested solution.
I tried writing my own do file based on your suggestions to my
previous post, but I couldn't get the code to work.
My main problem was that while I knew how to tag the first observation
of a group, I did not know how to successfully tag the last
observation. gsort appears to offer a good solution that I was not
aware of.

Thanks,
Marietta



On Tue, Aug 12, 2008 at 1:51 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> If I understand your question correctly, it yields to a minor variation
> on the technique documented in
>
> SJ-7-3  pr0033  . . . . . . . . . . . . . .  Stata tip 51: Events in
> intervals
>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N.
> J. Cox
>        Q3/07   SJ 7(3):440--443                                 (no
> commands)
>        tip for counting or summarizing irregularly spaced
>        events in intervals
>
> and explained in my posting replying to your previous question:
>
> <http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist
> .0808/date/article-20.html>
>
> local N = _N
> gsort Company_No - Date
> gen avex = .
> qui forval i = 1/`N' {
>        egen tag = tag(Investor_id) ///
>        if Company_no == Company_no[`i'] & ///
>        inrange(Date, Date[`i'] - 90, Date[`i'])
>
>        su x if tag, meanonly
>
>        replace avex = r(mean) in `i'
>
>        drop tag
> }
>
> The obvious variations on the previous example are using -summarize,
> meanonly- to get the mean and specifying the previous 90 days within an
> -inrange()- condition. (This code includes the current date; modify
> according to taste.)
>
> The not-so-obvious variation is using -egen, tag()- to tag the _last_
> relevant observation for each distinct Investor_id.
>
> -egen, tag()- tags just one of any repetitions of each distinct value of
> the variable specified. It's not documented, but the observation tagged
> is in fact the _first_ such occurrence in the dataset in the current
> sort order. Hence use -gsort- first to reverse date order in each panel,
> so that tagging catches the last such occurrence in time.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Marietta Jones
>
> I have a new problem related to the dataset I presented in my original
> email (see below).
>
> For each observation in the dataset, I would like to calculate the
> mean of variable x for the same company_no on a rolling prior 90 day
> basis.
> The extra complexity arises by the fact that there may be more than
> one observations of variable x for the same investor_id for a given
> company_no.
> I would only like to include the last observation per investor_id for
> a given company_no in the calculation of the mean.
>
>>> I have the following dataset:
>>>
>>> Company_no     Date          Investor_id
>>> 1                     03/01/2000          1
>>> 1                     04/09/2000          1
>>> 1                     12/12/2001          2
>>> 2                     13/12/2000          4
>>> 2                     07/08/2001          7
>>> 3                     09/08/2000          4
>>> 3                     19/03/2001          4
>>> 3                     02/05/2001          5
>>> 3                     03/12/2001          6
>>>
>>>
>>> For each observation, I would like to calculate the number of
> distinct
>>> investors owning shares in a given company in the previous 12 months.
>>> The new variable (No_investors_past_12_months) will look as follows:
>>>
>>> Company_no       Date        Investor_id
> No_investors_past_12_months
>>> 1                     03/01/2000         1                     1
>>> 1                     04/09/2000         1                     1
>>> 1                     12/12/2001         2                     1
>>> 2                     13/12/2000         4                     1
>>> 2                     07/08/2001         7                     2
>>> 3                     09/08/2000         4                     1
>>> 3                     19/03/2001         4                     1
>>> 3                     02/05/2001         5                     2
>>> 3                     03/12/2001         6                     3
>>>
>>> For example, for company 2 on 07/08/2001 there are two investors who
>>> have owned shares of this company within the past 12 months.
>>>
>>> The idea is that the same investor should be counted only once and we
>>> should count all observations over the past 12 months, including the
>>> current observation date. If there are no observations over the past
>>> 12 months, the value of the new variable should equal 1.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index