Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: algorithmic question : running sum and computations

From	Francesco <[email protected]>
To	[email protected]
Subject	Re: st: algorithmic question : running sum and computations
Date	Fri, 17 Aug 2012 13:58:45 +0200

Many, Many thanks Nick and Scott for your kind and very precise
answers! Spells is indeed what I needed ;-)


On 17 August 2012 13:43, Nick Cox <[email protected]> wrote:
> Using your data as a sandpit
>
> .  clear
>
> .  input id    date str1 product quantity
>
>             id       date    product   quantity
>   1.  1       1           A           10
>   2.  1       2           A           -10
>   3.  1       1           B            100
>   4.  1       2           B            -50
>   5.  1       4           C            15
>   6.  1       8           C            100
>   7.  1       9           C            -115
>   8.  1      10          C            10
>   9.  1      11          C            -10
>  10.  end
>
> it seems that we are interested in the length of time it takes for
> cumulative quantity to return to 0. -sum()- is there for cumulative
> sums:
>
> .  bysort id product (date) : gen cumq = sum(q)
>
> In one jargon, we are interested in "spells" defined by the fact that
> they end in 0s for cumulative quantity. In Stata it is easiest to work
> with initial conditions defining spells, so we negate the date
> variable to reverse time:
>
> .  gen negdate = -date
>
> As dates can be repeated for the same individual, treating data as
> panel data requires another fiction, that panels are defined by
> individuals and products:
>
> .  egen panelid = group(id product)
>
> Now we can -tsset- the data:
>
> .  tsset panelid negdate
>        panel variable:  panelid (unbalanced)
>         time variable:  negdate, -11 to -1, but with a gap
>                 delta:  1 unit
>
> -tsspell- from SSC, which you must install, is a tool for handling
> spells. It requires -tsset- data; the great benefit of that is that it
> handles panels automatically. (In fact almost all the credit belongs
> to StataCorp.) Here the criterion is that a spell is defined by
> starting with -cumq == 0-
>
> .  tsspell, fcond(cumq == 0)
>
> -tsspell- creates three variables with names by default _spell _seq
> _end. _end is especially useful: it is an indicator variable for end
> of spells (beginning of spells when time is reversed). You can read
> more in the help for -tsspell-.
>
> .  sort id product date
>
> .  l id product date cumq _*
>
>      +---------------------------------------------------+
>      | id   product   date   cumq   _spell   _seq   _end |
>      |---------------------------------------------------|
>   1. |  1         A      1     10        1      2      1 |
>   2. |  1         A      2      0        1      1      0 |
>   3. |  1         B      1    100        0      0      0 |
>   4. |  1         B      2     50        0      0      0 |
>   5. |  1         C      4     15        2      3      1 |
>      |---------------------------------------------------|
>   6. |  1         C      8    115        2      2      0 |
>   7. |  1         C      9      0        2      1      0 |
>   8. |  1         C     10     10        1      2      1 |
>   9. |  1         C     11      0        1      1      0 |
>      +---------------------------------------------------+
>
> You want the mean length of completed spells. Completed spells are
> tagged by _end == 1 or  cumq == 0
>
> .  egen meanlength = mean(_seq/ _end), by(id)
>
> This is my favourite division trick: _seq / _end is _seq if _end is 1
> and missing if _end is 0; missings are ignored by -egen-'s -mean()-
> function, so you get the mean length for each individual. It is
> repeated for each observation for each individual so you could go
>
> . egen tag = tag(id)
> . l id meanlength if tag
>
> I wrote a tutorial on spells.
>
>  SJ-7-2  dm0029  . . . . . . . . . . . . . . Speaking Stata: Identifying spells
>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>         Q2/07   SJ 7(2):249--265                                 (no commands)
>         shows how to handle spells with complete control over
>         spell specification
>
> which is accessible at
> http://www.stata-journal.com/sjpdf.html?articlenum=dm0029
>
> Its principles underlie -tsspell-, but -tsspell- is not even
> mentioned, for which there is a mundane explanation. Explaining some
> basics as clearly and carefully as I could produced a paper that was
> already long and detailed, and adding detail on -tsspell- would just
> have made that worse.
>
> For more on spells, see Rowling (1997, 1998, 1999, etc.).
>
> Nick
>
> On Fri, Aug 17, 2012 at 11:30 AM, Francesco <[email protected]> wrote:
>> Dear Statalist,
>>
>> I am stuck with a little algorithmic problem and I cannot find an
>> simple (or elegant) solution...
>>
>> I have a panel dataset as (date in days) :
>>
>> ID    DATE    PRODUCT QUANTITY
>> 1       1           A           10
>> 1       2           A           -10
>>
>> 1       1           B            100
>> 1       2           B            -50
>>
>> 1       4           C            15
>> 1       8           C            100
>> 1       9           C            -115
>>
>> 1      10          C            10
>> 1      11          C            -10
>>
>>
>>
>> and I would like to know the average time (in days) it takes for an
>> individual in order to complete a full round trip (the variation in
>> quantity is zero)
>> For example, for the first id we can see that there we have
>>
>> ID PRODUCT delta_DATE delta_QUANTITY
>> 1         A               1=2-1                  0=10-10
>> 1         C               5=4-9                  0=15+100-115
>> 1         C               1=11-10               0=10-10
>>
>> so on average individual 1 takes (1+5+1)/3=2.3 days to complete a full
>> round trip. Indeed I can discard product B because there is no round
>> trip, that is 100-50 is not equal to zero.
>>
>> My question is therefore ... do you have an idea obtain this simply in
>> Stata ? I have to average across thousands of individuals... :)
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: algorithmic question : running sum and computations
  - From: Francesco <[email protected]>

References:
- st: algorithmic question : running sum and computations
  - From: Francesco <[email protected]>
- Re: st: algorithmic question : running sum and computations
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: algorithmic question : running sum and computations
Next by Date: Re: Re: st: combining datasets
Previous by thread: Re: st: algorithmic question : running sum and computations
Next by thread: Re: st: algorithmic question : running sum and computations
Index(es):
- Date
- Thread