Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: algorithmic question : running sum and computations

From	Francesco <[email protected]>
To	Nick Cox <[email protected]>
Subject	Re: st: algorithmic question : running sum and computations
Date	Fri, 17 Aug 2012 15:33:15 +0200

You are perfectly right, Nick.

Unfortunately I cannot rank the observations by time order during a
day ... I only have the daily stamp of the observation (May 3rd for
example), not the hour and minute... therefore I cannot know if in
reality there was one spell or two (or more) according to your
example...
So maybe the best solution would be probably  to ask a slight
different question : for a given individual, what is his mean spell
lenght of time, conditional on days WITHOUT a full round trip. That is
not considering the days you suggest in your example below... which
can easily be identified thanks to -bysort id product date, egen
total_day=total(quantity) and filtered out with the condition
total_day==0... I guess... :-(

Have a nice day and thanks again,
Best,

On 17 August 2012 14:55, Nick Cox <[email protected]> wrote:
> I don't have easy advice on this. As I understand it sorting on
>
> id product (date)
>
> can't distinguish between
>
> id 1  product A date 42  quantity 12
> id 1  product A date 42  quantity -12
> id 1  product A date 42  quantity 21
> id 1  product A date 42  quantity -21
>
> and
>
> id 1  product A date 42  quantity 12
> id 1  product A date 42  quantity -21
> id 1  product A date 42  quantity 21
> id 1  product A date 42  quantity -12
>
> In the first case you have two spells to 0, and in the second one
> spell to 0. Your example shows that spells need not be two
> observations long, so I don't know what to suggest.
>
> Nick
>
> On Fri, Aug 17, 2012 at 1:45 PM, Francesco <[email protected]> wrote:
>> Actually Nick there is only a slight problem : dates could be repeated
>> for the same individual AND the same product  : for example there
>> could be several round trips during the same day for the same
>> product... In that case I would consider that there are as many
>> delta_Date equal to zero as different round trips during the day for a
>> particular product... My apologies I did not think of this particular
>> and important case...
>>
>> Could the trick  egen panelid = group(id product) be adapted in that case ?
>>
>> Many thanks
>> Best Regards
>>
>> On 17 August 2012 13:58, Francesco <[email protected]> wrote:
>>> Many, Many thanks Nick and Scott for your kind and very precise
>>> answers! Spells is indeed what I needed ;-)
>>>
>>>
>>> On 17 August 2012 13:43, Nick Cox <[email protected]> wrote:
>>>> Using your data as a sandpit
>>>>
>>>> .  clear
>>>>
>>>> .  input id    date str1 product quantity
>>>>
>>>>             id       date    product   quantity
>>>>   1.  1       1           A           10
>>>>   2.  1       2           A           -10
>>>>   3.  1       1           B            100
>>>>   4.  1       2           B            -50
>>>>   5.  1       4           C            15
>>>>   6.  1       8           C            100
>>>>   7.  1       9           C            -115
>>>>   8.  1      10          C            10
>>>>   9.  1      11          C            -10
>>>>  10.  end
>>>>
>>>> it seems that we are interested in the length of time it takes for
>>>> cumulative quantity to return to 0. -sum()- is there for cumulative
>>>> sums:
>>>>
>>>> .  bysort id product (date) : gen cumq = sum(q)
>>>>
>>>> In one jargon, we are interested in "spells" defined by the fact that
>>>> they end in 0s for cumulative quantity. In Stata it is easiest to work
>>>> with initial conditions defining spells, so we negate the date
>>>> variable to reverse time:
>>>>
>>>> .  gen negdate = -date
>>>>
>>>> As dates can be repeated for the same individual, treating data as
>>>> panel data requires another fiction, that panels are defined by
>>>> individuals and products:
>>>>
>>>> .  egen panelid = group(id product)
>>>>
>>>> Now we can -tsset- the data:
>>>>
>>>> .  tsset panelid negdate
>>>>        panel variable:  panelid (unbalanced)
>>>>         time variable:  negdate, -11 to -1, but with a gap
>>>>                 delta:  1 unit
>>>>
>>>> -tsspell- from SSC, which you must install, is a tool for handling
>>>> spells. It requires -tsset- data; the great benefit of that is that it
>>>> handles panels automatically. (In fact almost all the credit belongs
>>>> to StataCorp.) Here the criterion is that a spell is defined by
>>>> starting with -cumq == 0-
>>>>
>>>> .  tsspell, fcond(cumq == 0)
>>>>
>>>> -tsspell- creates three variables with names by default _spell _seq
>>>> _end. _end is especially useful: it is an indicator variable for end
>>>> of spells (beginning of spells when time is reversed). You can read
>>>> more in the help for -tsspell-.
>>>>
>>>> .  sort id product date
>>>>
>>>> .  l id product date cumq _*
>>>>
>>>>      +---------------------------------------------------+
>>>>      | id   product   date   cumq   _spell   _seq   _end |
>>>>      |---------------------------------------------------|
>>>>   1. |  1         A      1     10        1      2      1 |
>>>>   2. |  1         A      2      0        1      1      0 |
>>>>   3. |  1         B      1    100        0      0      0 |
>>>>   4. |  1         B      2     50        0      0      0 |
>>>>   5. |  1         C      4     15        2      3      1 |
>>>>      |---------------------------------------------------|
>>>>   6. |  1         C      8    115        2      2      0 |
>>>>   7. |  1         C      9      0        2      1      0 |
>>>>   8. |  1         C     10     10        1      2      1 |
>>>>   9. |  1         C     11      0        1      1      0 |
>>>>      +---------------------------------------------------+
>>>>
>>>> You want the mean length of completed spells. Completed spells are
>>>> tagged by _end == 1 or  cumq == 0
>>>>
>>>> .  egen meanlength = mean(_seq/ _end), by(id)
>>>>
>>>> This is my favourite division trick: _seq / _end is _seq if _end is 1
>>>> and missing if _end is 0; missings are ignored by -egen-'s -mean()-
>>>> function, so you get the mean length for each individual. It is
>>>> repeated for each observation for each individual so you could go
>>>>
>>>> . egen tag = tag(id)
>>>> . l id meanlength if tag
>>>>
>>>> I wrote a tutorial on spells.
>>>>
>>>>  SJ-7-2  dm0029  . . . . . . . . . . . . . . Speaking Stata: Identifying spells
>>>>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>>>>         Q2/07   SJ 7(2):249--265                                 (no commands)
>>>>         shows how to handle spells with complete control over
>>>>         spell specification
>>>>
>>>> which is accessible at
>>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0029
>>>>
>>>> Its principles underlie -tsspell-, but -tsspell- is not even
>>>> mentioned, for which there is a mundane explanation. Explaining some
>>>> basics as clearly and carefully as I could produced a paper that was
>>>> already long and detailed, and adding detail on -tsspell- would just
>>>> have made that worse.
>>>>
>>>> For more on spells, see Rowling (1997, 1998, 1999, etc.).
>>>>
>>>> Nick
>>>>
>>>> On Fri, Aug 17, 2012 at 11:30 AM, Francesco <[email protected]> wrote:
>>>>> Dear Statalist,
>>>>>
>>>>> I am stuck with a little algorithmic problem and I cannot find an
>>>>> simple (or elegant) solution...
>>>>>
>>>>> I have a panel dataset as (date in days) :
>>>>>
>>>>> ID    DATE    PRODUCT QUANTITY
>>>>> 1       1           A           10
>>>>> 1       2           A           -10
>>>>>
>>>>> 1       1           B            100
>>>>> 1       2           B            -50
>>>>>
>>>>> 1       4           C            15
>>>>> 1       8           C            100
>>>>> 1       9           C            -115
>>>>>
>>>>> 1      10          C            10
>>>>> 1      11          C            -10
>>>>>
>>>>>
>>>>>
>>>>> and I would like to know the average time (in days) it takes for an
>>>>> individual in order to complete a full round trip (the variation in
>>>>> quantity is zero)
>>>>> For example, for the first id we can see that there we have
>>>>>
>>>>> ID PRODUCT delta_DATE delta_QUANTITY
>>>>> 1         A               1=2-1                  0=10-10
>>>>> 1         C               5=4-9                  0=15+100-115
>>>>> 1         C               1=11-10               0=10-10
>>>>>
>>>>> so on average individual 1 takes (1+5+1)/3=2.3 days to complete a full
>>>>> round trip. Indeed I can discard product B because there is no round
>>>>> trip, that is 100-50 is not equal to zero.
>>>>>
>>>>> My question is therefore ... do you have an idea obtain this simply in
>>>>> Stata ? I have to average across thousands of individuals... :)
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: algorithmic question : running sum and computations
  - From: Nick Cox <[email protected]>

References:
- st: algorithmic question : running sum and computations
  - From: Francesco <[email protected]>
- Re: st: algorithmic question : running sum and computations
  - From: Nick Cox <[email protected]>
- Re: st: algorithmic question : running sum and computations
  - From: Francesco <[email protected]>
- Re: st: algorithmic question : running sum and computations
  - From: Francesco <[email protected]>
- Re: st: algorithmic question : running sum and computations
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: Adding Reference Bars to Bar Graph
Next by Date: Re: st: algorithmic question : running sum and computations
Previous by thread: Re: st: algorithmic question : running sum and computations
Next by thread: Re: st: algorithmic question : running sum and computations
Index(es):
- Date
- Thread