Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Francesco <k7br@gmx.fr> |

To |
Nick Cox <njcoxstata@gmail.com> |

Subject |
Re: st: algorithmic question : running sum and computations |

Date |
Fri, 17 Aug 2012 15:33:15 +0200 |

You are perfectly right, Nick. Unfortunately I cannot rank the observations by time order during a day ... I only have the daily stamp of the observation (May 3rd for example), not the hour and minute... therefore I cannot know if in reality there was one spell or two (or more) according to your example... So maybe the best solution would be probably to ask a slight different question : for a given individual, what is his mean spell lenght of time, conditional on days WITHOUT a full round trip. That is not considering the days you suggest in your example below... which can easily be identified thanks to -bysort id product date, egen total_day=total(quantity) and filtered out with the condition total_day==0... I guess... :-( Have a nice day and thanks again, Best, On 17 August 2012 14:55, Nick Cox <njcoxstata@gmail.com> wrote: > I don't have easy advice on this. As I understand it sorting on > > id product (date) > > can't distinguish between > > id 1 product A date 42 quantity 12 > id 1 product A date 42 quantity -12 > id 1 product A date 42 quantity 21 > id 1 product A date 42 quantity -21 > > and > > id 1 product A date 42 quantity 12 > id 1 product A date 42 quantity -21 > id 1 product A date 42 quantity 21 > id 1 product A date 42 quantity -12 > > In the first case you have two spells to 0, and in the second one > spell to 0. Your example shows that spells need not be two > observations long, so I don't know what to suggest. > > Nick > > On Fri, Aug 17, 2012 at 1:45 PM, Francesco <k7br@gmx.fr> wrote: >> Actually Nick there is only a slight problem : dates could be repeated >> for the same individual AND the same product : for example there >> could be several round trips during the same day for the same >> product... In that case I would consider that there are as many >> delta_Date equal to zero as different round trips during the day for a >> particular product... My apologies I did not think of this particular >> and important case... >> >> Could the trick egen panelid = group(id product) be adapted in that case ? >> >> Many thanks >> Best Regards >> >> On 17 August 2012 13:58, Francesco <k7br@gmx.fr> wrote: >>> Many, Many thanks Nick and Scott for your kind and very precise >>> answers! Spells is indeed what I needed ;-) >>> >>> >>> On 17 August 2012 13:43, Nick Cox <njcoxstata@gmail.com> wrote: >>>> Using your data as a sandpit >>>> >>>> . clear >>>> >>>> . input id date str1 product quantity >>>> >>>> id date product quantity >>>> 1. 1 1 A 10 >>>> 2. 1 2 A -10 >>>> 3. 1 1 B 100 >>>> 4. 1 2 B -50 >>>> 5. 1 4 C 15 >>>> 6. 1 8 C 100 >>>> 7. 1 9 C -115 >>>> 8. 1 10 C 10 >>>> 9. 1 11 C -10 >>>> 10. end >>>> >>>> it seems that we are interested in the length of time it takes for >>>> cumulative quantity to return to 0. -sum()- is there for cumulative >>>> sums: >>>> >>>> . bysort id product (date) : gen cumq = sum(q) >>>> >>>> In one jargon, we are interested in "spells" defined by the fact that >>>> they end in 0s for cumulative quantity. In Stata it is easiest to work >>>> with initial conditions defining spells, so we negate the date >>>> variable to reverse time: >>>> >>>> . gen negdate = -date >>>> >>>> As dates can be repeated for the same individual, treating data as >>>> panel data requires another fiction, that panels are defined by >>>> individuals and products: >>>> >>>> . egen panelid = group(id product) >>>> >>>> Now we can -tsset- the data: >>>> >>>> . tsset panelid negdate >>>> panel variable: panelid (unbalanced) >>>> time variable: negdate, -11 to -1, but with a gap >>>> delta: 1 unit >>>> >>>> -tsspell- from SSC, which you must install, is a tool for handling >>>> spells. It requires -tsset- data; the great benefit of that is that it >>>> handles panels automatically. (In fact almost all the credit belongs >>>> to StataCorp.) Here the criterion is that a spell is defined by >>>> starting with -cumq == 0- >>>> >>>> . tsspell, fcond(cumq == 0) >>>> >>>> -tsspell- creates three variables with names by default _spell _seq >>>> _end. _end is especially useful: it is an indicator variable for end >>>> of spells (beginning of spells when time is reversed). You can read >>>> more in the help for -tsspell-. >>>> >>>> . sort id product date >>>> >>>> . l id product date cumq _* >>>> >>>> +---------------------------------------------------+ >>>> | id product date cumq _spell _seq _end | >>>> |---------------------------------------------------| >>>> 1. | 1 A 1 10 1 2 1 | >>>> 2. | 1 A 2 0 1 1 0 | >>>> 3. | 1 B 1 100 0 0 0 | >>>> 4. | 1 B 2 50 0 0 0 | >>>> 5. | 1 C 4 15 2 3 1 | >>>> |---------------------------------------------------| >>>> 6. | 1 C 8 115 2 2 0 | >>>> 7. | 1 C 9 0 2 1 0 | >>>> 8. | 1 C 10 10 1 2 1 | >>>> 9. | 1 C 11 0 1 1 0 | >>>> +---------------------------------------------------+ >>>> >>>> You want the mean length of completed spells. Completed spells are >>>> tagged by _end == 1 or cumq == 0 >>>> >>>> . egen meanlength = mean(_seq/ _end), by(id) >>>> >>>> This is my favourite division trick: _seq / _end is _seq if _end is 1 >>>> and missing if _end is 0; missings are ignored by -egen-'s -mean()- >>>> function, so you get the mean length for each individual. It is >>>> repeated for each observation for each individual so you could go >>>> >>>> . egen tag = tag(id) >>>> . l id meanlength if tag >>>> >>>> I wrote a tutorial on spells. >>>> >>>> SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells >>>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox >>>> Q2/07 SJ 7(2):249--265 (no commands) >>>> shows how to handle spells with complete control over >>>> spell specification >>>> >>>> which is accessible at >>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0029 >>>> >>>> Its principles underlie -tsspell-, but -tsspell- is not even >>>> mentioned, for which there is a mundane explanation. Explaining some >>>> basics as clearly and carefully as I could produced a paper that was >>>> already long and detailed, and adding detail on -tsspell- would just >>>> have made that worse. >>>> >>>> For more on spells, see Rowling (1997, 1998, 1999, etc.). >>>> >>>> Nick >>>> >>>> On Fri, Aug 17, 2012 at 11:30 AM, Francesco <cariboupad@gmx.fr> wrote: >>>>> Dear Statalist, >>>>> >>>>> I am stuck with a little algorithmic problem and I cannot find an >>>>> simple (or elegant) solution... >>>>> >>>>> I have a panel dataset as (date in days) : >>>>> >>>>> ID DATE PRODUCT QUANTITY >>>>> 1 1 A 10 >>>>> 1 2 A -10 >>>>> >>>>> 1 1 B 100 >>>>> 1 2 B -50 >>>>> >>>>> 1 4 C 15 >>>>> 1 8 C 100 >>>>> 1 9 C -115 >>>>> >>>>> 1 10 C 10 >>>>> 1 11 C -10 >>>>> >>>>> >>>>> >>>>> and I would like to know the average time (in days) it takes for an >>>>> individual in order to complete a full round trip (the variation in >>>>> quantity is zero) >>>>> For example, for the first id we can see that there we have >>>>> >>>>> ID PRODUCT delta_DATE delta_QUANTITY >>>>> 1 A 1=2-1 0=10-10 >>>>> 1 C 5=4-9 0=15+100-115 >>>>> 1 C 1=11-10 0=10-10 >>>>> >>>>> so on average individual 1 takes (1+5+1)/3=2.3 days to complete a full >>>>> round trip. Indeed I can discard product B because there is no round >>>>> trip, that is 100-50 is not equal to zero. >>>>> >>>>> My question is therefore ... do you have an idea obtain this simply in >>>>> Stata ? I have to average across thousands of individuals... :) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: algorithmic question : running sum and computations***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: algorithmic question : running sum and computations***From:*Francesco <cariboupad@gmx.fr>

**Re: st: algorithmic question : running sum and computations***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: algorithmic question : running sum and computations***From:*Francesco <k7br@gmx.fr>

**Re: st: algorithmic question : running sum and computations***From:*Francesco <k7br@gmx.fr>

**Re: st: algorithmic question : running sum and computations***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: Adding Reference Bars to Bar Graph** - Next by Date:
**Re: st: algorithmic question : running sum and computations** - Previous by thread:
**Re: st: algorithmic question : running sum and computations** - Next by thread:
**Re: st: algorithmic question : running sum and computations** - Index(es):