Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Francesco <k7br@gmx.fr> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: algorithmic question : running sum and computations |

Date |
Fri, 17 Aug 2012 13:58:45 +0200 |

Many, Many thanks Nick and Scott for your kind and very precise answers! Spells is indeed what I needed ;-) On 17 August 2012 13:43, Nick Cox <njcoxstata@gmail.com> wrote: > Using your data as a sandpit > > . clear > > . input id date str1 product quantity > > id date product quantity > 1. 1 1 A 10 > 2. 1 2 A -10 > 3. 1 1 B 100 > 4. 1 2 B -50 > 5. 1 4 C 15 > 6. 1 8 C 100 > 7. 1 9 C -115 > 8. 1 10 C 10 > 9. 1 11 C -10 > 10. end > > it seems that we are interested in the length of time it takes for > cumulative quantity to return to 0. -sum()- is there for cumulative > sums: > > . bysort id product (date) : gen cumq = sum(q) > > In one jargon, we are interested in "spells" defined by the fact that > they end in 0s for cumulative quantity. In Stata it is easiest to work > with initial conditions defining spells, so we negate the date > variable to reverse time: > > . gen negdate = -date > > As dates can be repeated for the same individual, treating data as > panel data requires another fiction, that panels are defined by > individuals and products: > > . egen panelid = group(id product) > > Now we can -tsset- the data: > > . tsset panelid negdate > panel variable: panelid (unbalanced) > time variable: negdate, -11 to -1, but with a gap > delta: 1 unit > > -tsspell- from SSC, which you must install, is a tool for handling > spells. It requires -tsset- data; the great benefit of that is that it > handles panels automatically. (In fact almost all the credit belongs > to StataCorp.) Here the criterion is that a spell is defined by > starting with -cumq == 0- > > . tsspell, fcond(cumq == 0) > > -tsspell- creates three variables with names by default _spell _seq > _end. _end is especially useful: it is an indicator variable for end > of spells (beginning of spells when time is reversed). You can read > more in the help for -tsspell-. > > . sort id product date > > . l id product date cumq _* > > +---------------------------------------------------+ > | id product date cumq _spell _seq _end | > |---------------------------------------------------| > 1. | 1 A 1 10 1 2 1 | > 2. | 1 A 2 0 1 1 0 | > 3. | 1 B 1 100 0 0 0 | > 4. | 1 B 2 50 0 0 0 | > 5. | 1 C 4 15 2 3 1 | > |---------------------------------------------------| > 6. | 1 C 8 115 2 2 0 | > 7. | 1 C 9 0 2 1 0 | > 8. | 1 C 10 10 1 2 1 | > 9. | 1 C 11 0 1 1 0 | > +---------------------------------------------------+ > > You want the mean length of completed spells. Completed spells are > tagged by _end == 1 or cumq == 0 > > . egen meanlength = mean(_seq/ _end), by(id) > > This is my favourite division trick: _seq / _end is _seq if _end is 1 > and missing if _end is 0; missings are ignored by -egen-'s -mean()- > function, so you get the mean length for each individual. It is > repeated for each observation for each individual so you could go > > . egen tag = tag(id) > . l id meanlength if tag > > I wrote a tutorial on spells. > > SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox > Q2/07 SJ 7(2):249--265 (no commands) > shows how to handle spells with complete control over > spell specification > > which is accessible at > http://www.stata-journal.com/sjpdf.html?articlenum=dm0029 > > Its principles underlie -tsspell-, but -tsspell- is not even > mentioned, for which there is a mundane explanation. Explaining some > basics as clearly and carefully as I could produced a paper that was > already long and detailed, and adding detail on -tsspell- would just > have made that worse. > > For more on spells, see Rowling (1997, 1998, 1999, etc.). > > Nick > > On Fri, Aug 17, 2012 at 11:30 AM, Francesco <cariboupad@gmx.fr> wrote: >> Dear Statalist, >> >> I am stuck with a little algorithmic problem and I cannot find an >> simple (or elegant) solution... >> >> I have a panel dataset as (date in days) : >> >> ID DATE PRODUCT QUANTITY >> 1 1 A 10 >> 1 2 A -10 >> >> 1 1 B 100 >> 1 2 B -50 >> >> 1 4 C 15 >> 1 8 C 100 >> 1 9 C -115 >> >> 1 10 C 10 >> 1 11 C -10 >> >> >> >> and I would like to know the average time (in days) it takes for an >> individual in order to complete a full round trip (the variation in >> quantity is zero) >> For example, for the first id we can see that there we have >> >> ID PRODUCT delta_DATE delta_QUANTITY >> 1 A 1=2-1 0=10-10 >> 1 C 5=4-9 0=15+100-115 >> 1 C 1=11-10 0=10-10 >> >> so on average individual 1 takes (1+5+1)/3=2.3 days to complete a full >> round trip. Indeed I can discard product B because there is no round >> trip, that is 100-50 is not equal to zero. >> >> My question is therefore ... do you have an idea obtain this simply in >> Stata ? I have to average across thousands of individuals... :) > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: algorithmic question : running sum and computations***From:*Francesco <k7br@gmx.fr>

**References**:**st: algorithmic question : running sum and computations***From:*Francesco <cariboupad@gmx.fr>

**Re: st: algorithmic question : running sum and computations***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: algorithmic question : running sum and computations** - Next by Date:
**Re: Re: st: combining datasets** - Previous by thread:
**Re: st: algorithmic question : running sum and computations** - Next by thread:
**Re: st: algorithmic question : running sum and computations** - Index(es):