Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
Francesco <k7br@gmx.fr> |

Subject |
Re: st: algorithmic question : running sum and computations |

Date |
Fri, 17 Aug 2012 14:38:27 +0100 |

Just -drop- such observations, at least temporarily. Nick On Fri, Aug 17, 2012 at 2:33 PM, Francesco <k7br@gmx.fr> wrote: > You are perfectly right, Nick. > > Unfortunately I cannot rank the observations by time order during a > day ... I only have the daily stamp of the observation (May 3rd for > example), not the hour and minute... therefore I cannot know if in > reality there was one spell or two (or more) according to your > example... > So maybe the best solution would be probably to ask a slight > different question : for a given individual, what is his mean spell > lenght of time, conditional on days WITHOUT a full round trip. That is > not considering the days you suggest in your example below... which > can easily be identified thanks to -bysort id product date, egen > total_day=total(quantity) and filtered out with the condition > total_day==0... I guess... :-( > > Have a nice day and thanks again, > Best, > > On 17 August 2012 14:55, Nick Cox <njcoxstata@gmail.com> wrote: >> I don't have easy advice on this. As I understand it sorting on >> >> id product (date) >> >> can't distinguish between >> >> id 1 product A date 42 quantity 12 >> id 1 product A date 42 quantity -12 >> id 1 product A date 42 quantity 21 >> id 1 product A date 42 quantity -21 >> >> and >> >> id 1 product A date 42 quantity 12 >> id 1 product A date 42 quantity -21 >> id 1 product A date 42 quantity 21 >> id 1 product A date 42 quantity -12 >> >> In the first case you have two spells to 0, and in the second one >> spell to 0. Your example shows that spells need not be two >> observations long, so I don't know what to suggest. >> >> Nick >> >> On Fri, Aug 17, 2012 at 1:45 PM, Francesco <k7br@gmx.fr> wrote: >>> Actually Nick there is only a slight problem : dates could be repeated >>> for the same individual AND the same product : for example there >>> could be several round trips during the same day for the same >>> product... In that case I would consider that there are as many >>> delta_Date equal to zero as different round trips during the day for a >>> particular product... My apologies I did not think of this particular >>> and important case... >>> >>> Could the trick egen panelid = group(id product) be adapted in that case ? >>> >>> Many thanks >>> Best Regards >>> >>> On 17 August 2012 13:58, Francesco <k7br@gmx.fr> wrote: >>>> Many, Many thanks Nick and Scott for your kind and very precise >>>> answers! Spells is indeed what I needed ;-) >>>> >>>> >>>> On 17 August 2012 13:43, Nick Cox <njcoxstata@gmail.com> wrote: >>>>> Using your data as a sandpit >>>>> >>>>> . clear >>>>> >>>>> . input id date str1 product quantity >>>>> >>>>> id date product quantity >>>>> 1. 1 1 A 10 >>>>> 2. 1 2 A -10 >>>>> 3. 1 1 B 100 >>>>> 4. 1 2 B -50 >>>>> 5. 1 4 C 15 >>>>> 6. 1 8 C 100 >>>>> 7. 1 9 C -115 >>>>> 8. 1 10 C 10 >>>>> 9. 1 11 C -10 >>>>> 10. end >>>>> >>>>> it seems that we are interested in the length of time it takes for >>>>> cumulative quantity to return to 0. -sum()- is there for cumulative >>>>> sums: >>>>> >>>>> . bysort id product (date) : gen cumq = sum(q) >>>>> >>>>> In one jargon, we are interested in "spells" defined by the fact that >>>>> they end in 0s for cumulative quantity. In Stata it is easiest to work >>>>> with initial conditions defining spells, so we negate the date >>>>> variable to reverse time: >>>>> >>>>> . gen negdate = -date >>>>> >>>>> As dates can be repeated for the same individual, treating data as >>>>> panel data requires another fiction, that panels are defined by >>>>> individuals and products: >>>>> >>>>> . egen panelid = group(id product) >>>>> >>>>> Now we can -tsset- the data: >>>>> >>>>> . tsset panelid negdate >>>>> panel variable: panelid (unbalanced) >>>>> time variable: negdate, -11 to -1, but with a gap >>>>> delta: 1 unit >>>>> >>>>> -tsspell- from SSC, which you must install, is a tool for handling >>>>> spells. It requires -tsset- data; the great benefit of that is that it >>>>> handles panels automatically. (In fact almost all the credit belongs >>>>> to StataCorp.) Here the criterion is that a spell is defined by >>>>> starting with -cumq == 0- >>>>> >>>>> . tsspell, fcond(cumq == 0) >>>>> >>>>> -tsspell- creates three variables with names by default _spell _seq >>>>> _end. _end is especially useful: it is an indicator variable for end >>>>> of spells (beginning of spells when time is reversed). You can read >>>>> more in the help for -tsspell-. >>>>> >>>>> . sort id product date >>>>> >>>>> . l id product date cumq _* >>>>> >>>>> +---------------------------------------------------+ >>>>> | id product date cumq _spell _seq _end | >>>>> |---------------------------------------------------| >>>>> 1. | 1 A 1 10 1 2 1 | >>>>> 2. | 1 A 2 0 1 1 0 | >>>>> 3. | 1 B 1 100 0 0 0 | >>>>> 4. | 1 B 2 50 0 0 0 | >>>>> 5. | 1 C 4 15 2 3 1 | >>>>> |---------------------------------------------------| >>>>> 6. | 1 C 8 115 2 2 0 | >>>>> 7. | 1 C 9 0 2 1 0 | >>>>> 8. | 1 C 10 10 1 2 1 | >>>>> 9. | 1 C 11 0 1 1 0 | >>>>> +---------------------------------------------------+ >>>>> >>>>> You want the mean length of completed spells. Completed spells are >>>>> tagged by _end == 1 or cumq == 0 >>>>> >>>>> . egen meanlength = mean(_seq/ _end), by(id) >>>>> >>>>> This is my favourite division trick: _seq / _end is _seq if _end is 1 >>>>> and missing if _end is 0; missings are ignored by -egen-'s -mean()- >>>>> function, so you get the mean length for each individual. It is >>>>> repeated for each observation for each individual so you could go >>>>> >>>>> . egen tag = tag(id) >>>>> . l id meanlength if tag >>>>> >>>>> I wrote a tutorial on spells. >>>>> >>>>> SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells >>>>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox >>>>> Q2/07 SJ 7(2):249--265 (no commands) >>>>> shows how to handle spells with complete control over >>>>> spell specification >>>>> >>>>> which is accessible at >>>>> http://www.stata-journal.com/sjpdf.html?articlenum=dm0029 >>>>> >>>>> Its principles underlie -tsspell-, but -tsspell- is not even >>>>> mentioned, for which there is a mundane explanation. Explaining some >>>>> basics as clearly and carefully as I could produced a paper that was >>>>> already long and detailed, and adding detail on -tsspell- would just >>>>> have made that worse. >>>>> >>>>> For more on spells, see Rowling (1997, 1998, 1999, etc.). >>>>> >>>>> Nick >>>>> >>>>> On Fri, Aug 17, 2012 at 11:30 AM, Francesco <cariboupad@gmx.fr> wrote: >>>>>> Dear Statalist, >>>>>> >>>>>> I am stuck with a little algorithmic problem and I cannot find an >>>>>> simple (or elegant) solution... >>>>>> >>>>>> I have a panel dataset as (date in days) : >>>>>> >>>>>> ID DATE PRODUCT QUANTITY >>>>>> 1 1 A 10 >>>>>> 1 2 A -10 >>>>>> >>>>>> 1 1 B 100 >>>>>> 1 2 B -50 >>>>>> >>>>>> 1 4 C 15 >>>>>> 1 8 C 100 >>>>>> 1 9 C -115 >>>>>> >>>>>> 1 10 C 10 >>>>>> 1 11 C -10 >>>>>> >>>>>> >>>>>> >>>>>> and I would like to know the average time (in days) it takes for an >>>>>> individual in order to complete a full round trip (the variation in >>>>>> quantity is zero) >>>>>> For example, for the first id we can see that there we have >>>>>> >>>>>> ID PRODUCT delta_DATE delta_QUANTITY >>>>>> 1 A 1=2-1 0=10-10 >>>>>> 1 C 5=4-9 0=15+100-115 >>>>>> 1 C 1=11-10 0=10-10 >>>>>> >>>>>> so on average individual 1 takes (1+5+1)/3=2.3 days to complete a full >>>>>> round trip. Indeed I can discard product B because there is no round >>>>>> trip, that is 100-50 is not equal to zero. >>>>>> >>>>>> My question is therefore ... do you have an idea obtain this simply in >>>>>> Stata ? I have to average across thousands of individuals... :) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: algorithmic question : running sum and computations***From:*Francesco <cariboupad@gmx.fr>

**Re: st: algorithmic question : running sum and computations***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: algorithmic question : running sum and computations***From:*Francesco <k7br@gmx.fr>

**Re: st: algorithmic question : running sum and computations***From:*Francesco <k7br@gmx.fr>

**Re: st: algorithmic question : running sum and computations***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: algorithmic question : running sum and computations***From:*Francesco <k7br@gmx.fr>

- Prev by Date:
**Re: st: algorithmic question : running sum and computations** - Next by Date:
**RE: st: coefficient interpretation in OLS** - Previous by thread:
**Re: st: algorithmic question : running sum and computations** - Next by thread:
**st: Thread-Index: Ac18ah9HLfg+y+2rQ6ipAnbaSTSKMg==** - Index(es):