Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: algorithmic question : running sum and computations |

Date |
Fri, 17 Aug 2012 12:43:56 +0100 |

Using your data as a sandpit . clear . input id date str1 product quantity id date product quantity 1. 1 1 A 10 2. 1 2 A -10 3. 1 1 B 100 4. 1 2 B -50 5. 1 4 C 15 6. 1 8 C 100 7. 1 9 C -115 8. 1 10 C 10 9. 1 11 C -10 10. end it seems that we are interested in the length of time it takes for cumulative quantity to return to 0. -sum()- is there for cumulative sums: . bysort id product (date) : gen cumq = sum(q) In one jargon, we are interested in "spells" defined by the fact that they end in 0s for cumulative quantity. In Stata it is easiest to work with initial conditions defining spells, so we negate the date variable to reverse time: . gen negdate = -date As dates can be repeated for the same individual, treating data as panel data requires another fiction, that panels are defined by individuals and products: . egen panelid = group(id product) Now we can -tsset- the data: . tsset panelid negdate panel variable: panelid (unbalanced) time variable: negdate, -11 to -1, but with a gap delta: 1 unit -tsspell- from SSC, which you must install, is a tool for handling spells. It requires -tsset- data; the great benefit of that is that it handles panels automatically. (In fact almost all the credit belongs to StataCorp.) Here the criterion is that a spell is defined by starting with -cumq == 0- . tsspell, fcond(cumq == 0) -tsspell- creates three variables with names by default _spell _seq _end. _end is especially useful: it is an indicator variable for end of spells (beginning of spells when time is reversed). You can read more in the help for -tsspell-. . sort id product date . l id product date cumq _* +---------------------------------------------------+ | id product date cumq _spell _seq _end | |---------------------------------------------------| 1. | 1 A 1 10 1 2 1 | 2. | 1 A 2 0 1 1 0 | 3. | 1 B 1 100 0 0 0 | 4. | 1 B 2 50 0 0 0 | 5. | 1 C 4 15 2 3 1 | |---------------------------------------------------| 6. | 1 C 8 115 2 2 0 | 7. | 1 C 9 0 2 1 0 | 8. | 1 C 10 10 1 2 1 | 9. | 1 C 11 0 1 1 0 | +---------------------------------------------------+ You want the mean length of completed spells. Completed spells are tagged by _end == 1 or cumq == 0 . egen meanlength = mean(_seq/ _end), by(id) This is my favourite division trick: _seq / _end is _seq if _end is 1 and missing if _end is 0; missings are ignored by -egen-'s -mean()- function, so you get the mean length for each individual. It is repeated for each observation for each individual so you could go . egen tag = tag(id) . l id meanlength if tag I wrote a tutorial on spells. SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q2/07 SJ 7(2):249--265 (no commands) shows how to handle spells with complete control over spell specification which is accessible at http://www.stata-journal.com/sjpdf.html?articlenum=dm0029 Its principles underlie -tsspell-, but -tsspell- is not even mentioned, for which there is a mundane explanation. Explaining some basics as clearly and carefully as I could produced a paper that was already long and detailed, and adding detail on -tsspell- would just have made that worse. For more on spells, see Rowling (1997, 1998, 1999, etc.). Nick On Fri, Aug 17, 2012 at 11:30 AM, Francesco <cariboupad@gmx.fr> wrote: > Dear Statalist, > > I am stuck with a little algorithmic problem and I cannot find an > simple (or elegant) solution... > > I have a panel dataset as (date in days) : > > ID DATE PRODUCT QUANTITY > 1 1 A 10 > 1 2 A -10 > > 1 1 B 100 > 1 2 B -50 > > 1 4 C 15 > 1 8 C 100 > 1 9 C -115 > > 1 10 C 10 > 1 11 C -10 > > > > and I would like to know the average time (in days) it takes for an > individual in order to complete a full round trip (the variation in > quantity is zero) > For example, for the first id we can see that there we have > > ID PRODUCT delta_DATE delta_QUANTITY > 1 A 1=2-1 0=10-10 > 1 C 5=4-9 0=15+100-115 > 1 C 1=11-10 0=10-10 > > so on average individual 1 takes (1+5+1)/3=2.3 days to complete a full > round trip. Indeed I can discard product B because there is no round > trip, that is 100-50 is not equal to zero. > > My question is therefore ... do you have an idea obtain this simply in > Stata ? I have to average across thousands of individuals... :) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: algorithmic question : running sum and computations***From:*Francesco <k7br@gmx.fr>

**References**:**st: algorithmic question : running sum and computations***From:*Francesco <cariboupad@gmx.fr>

- Prev by Date:
**Re: st: algorithmic question : running sum and computations** - Next by Date:
**Re: st: algorithmic question : running sum and computations** - Previous by thread:
**Re: st: algorithmic question : running sum and computations** - Next by thread:
**Re: st: algorithmic question : running sum and computations** - Index(es):