Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: algorithmic question : running sum and computations


From   Francesco <k7br@gmx.fr>
To   njcoxstata@gmail.com
Subject   Re: st: algorithmic question : running sum and computations
Date   Fri, 17 Aug 2012 14:45:43 +0200

Actually Nick there is only a slight problem : dates could be repeated
for the same individual AND the same product  : for example there
could be several round trips during the same day for the same
product... In that case I would consider that there are as many
delta_Date equal to zero as different round trips during the day for a
particular product... My apologies I did not think of this particular
and important case...

Could the trick  egen panelid = group(id product) be adapted in that case ?

Many thanks
Best Regards

On 17 August 2012 13:58, Francesco <k7br@gmx.fr> wrote:
> Many, Many thanks Nick and Scott for your kind and very precise
> answers! Spells is indeed what I needed ;-)
>
>
> On 17 August 2012 13:43, Nick Cox <njcoxstata@gmail.com> wrote:
>> Using your data as a sandpit
>>
>> .  clear
>>
>> .  input id    date str1 product quantity
>>
>>             id       date    product   quantity
>>   1.  1       1           A           10
>>   2.  1       2           A           -10
>>   3.  1       1           B            100
>>   4.  1       2           B            -50
>>   5.  1       4           C            15
>>   6.  1       8           C            100
>>   7.  1       9           C            -115
>>   8.  1      10          C            10
>>   9.  1      11          C            -10
>>  10.  end
>>
>> it seems that we are interested in the length of time it takes for
>> cumulative quantity to return to 0. -sum()- is there for cumulative
>> sums:
>>
>> .  bysort id product (date) : gen cumq = sum(q)
>>
>> In one jargon, we are interested in "spells" defined by the fact that
>> they end in 0s for cumulative quantity. In Stata it is easiest to work
>> with initial conditions defining spells, so we negate the date
>> variable to reverse time:
>>
>> .  gen negdate = -date
>>
>> As dates can be repeated for the same individual, treating data as
>> panel data requires another fiction, that panels are defined by
>> individuals and products:
>>
>> .  egen panelid = group(id product)
>>
>> Now we can -tsset- the data:
>>
>> .  tsset panelid negdate
>>        panel variable:  panelid (unbalanced)
>>         time variable:  negdate, -11 to -1, but with a gap
>>                 delta:  1 unit
>>
>> -tsspell- from SSC, which you must install, is a tool for handling
>> spells. It requires -tsset- data; the great benefit of that is that it
>> handles panels automatically. (In fact almost all the credit belongs
>> to StataCorp.) Here the criterion is that a spell is defined by
>> starting with -cumq == 0-
>>
>> .  tsspell, fcond(cumq == 0)
>>
>> -tsspell- creates three variables with names by default _spell _seq
>> _end. _end is especially useful: it is an indicator variable for end
>> of spells (beginning of spells when time is reversed). You can read
>> more in the help for -tsspell-.
>>
>> .  sort id product date
>>
>> .  l id product date cumq _*
>>
>>      +---------------------------------------------------+
>>      | id   product   date   cumq   _spell   _seq   _end |
>>      |---------------------------------------------------|
>>   1. |  1         A      1     10        1      2      1 |
>>   2. |  1         A      2      0        1      1      0 |
>>   3. |  1         B      1    100        0      0      0 |
>>   4. |  1         B      2     50        0      0      0 |
>>   5. |  1         C      4     15        2      3      1 |
>>      |---------------------------------------------------|
>>   6. |  1         C      8    115        2      2      0 |
>>   7. |  1         C      9      0        2      1      0 |
>>   8. |  1         C     10     10        1      2      1 |
>>   9. |  1         C     11      0        1      1      0 |
>>      +---------------------------------------------------+
>>
>> You want the mean length of completed spells. Completed spells are
>> tagged by _end == 1 or  cumq == 0
>>
>> .  egen meanlength = mean(_seq/ _end), by(id)
>>
>> This is my favourite division trick: _seq / _end is _seq if _end is 1
>> and missing if _end is 0; missings are ignored by -egen-'s -mean()-
>> function, so you get the mean length for each individual. It is
>> repeated for each observation for each individual so you could go
>>
>> . egen tag = tag(id)
>> . l id meanlength if tag
>>
>> I wrote a tutorial on spells.
>>
>>  SJ-7-2  dm0029  . . . . . . . . . . . . . . Speaking Stata: Identifying spells
>>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>>         Q2/07   SJ 7(2):249--265                                 (no commands)
>>         shows how to handle spells with complete control over
>>         spell specification
>>
>> which is accessible at
>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0029
>>
>> Its principles underlie -tsspell-, but -tsspell- is not even
>> mentioned, for which there is a mundane explanation. Explaining some
>> basics as clearly and carefully as I could produced a paper that was
>> already long and detailed, and adding detail on -tsspell- would just
>> have made that worse.
>>
>> For more on spells, see Rowling (1997, 1998, 1999, etc.).
>>
>> Nick
>>
>> On Fri, Aug 17, 2012 at 11:30 AM, Francesco <cariboupad@gmx.fr> wrote:
>>> Dear Statalist,
>>>
>>> I am stuck with a little algorithmic problem and I cannot find an
>>> simple (or elegant) solution...
>>>
>>> I have a panel dataset as (date in days) :
>>>
>>> ID    DATE    PRODUCT QUANTITY
>>> 1       1           A           10
>>> 1       2           A           -10
>>>
>>> 1       1           B            100
>>> 1       2           B            -50
>>>
>>> 1       4           C            15
>>> 1       8           C            100
>>> 1       9           C            -115
>>>
>>> 1      10          C            10
>>> 1      11          C            -10
>>>
>>>
>>>
>>> and I would like to know the average time (in days) it takes for an
>>> individual in order to complete a full round trip (the variation in
>>> quantity is zero)
>>> For example, for the first id we can see that there we have
>>>
>>> ID PRODUCT delta_DATE delta_QUANTITY
>>> 1         A               1=2-1                  0=10-10
>>> 1         C               5=4-9                  0=15+100-115
>>> 1         C               1=11-10               0=10-10
>>>
>>> so on average individual 1 takes (1+5+1)/3=2.3 days to complete a full
>>> round trip. Indeed I can discard product B because there is no round
>>> trip, that is 100-50 is not equal to zero.
>>>
>>> My question is therefore ... do you have an idea obtain this simply in
>>> Stata ? I have to average across thousands of individuals... :)
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index