Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Eric Booth <eric.a.booth@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: calculating cumulative values of other observations |

Date |
Sat, 7 Apr 2012 16:36:28 -0500 |

<> Your new [if] condition is preventing the adapted version of Nick's code from creating the running sum properly. To see how the [if] condition you added is "filtering" the observations that are eligible for the running sum, create an indicator/flag for your [if] conditions and inspect the results: *** bysort ID activity (date_start): gen ind = 1 if /// (date_start>=date_end[_n-1] & !mi(date_end[_n-1])) *** You should be able to see that this stops Stata from including the observations you wanted included in the sum using 'ind'. Without seeing your new variables - you could probably break your new code into two steps like: *** bysort ID activity (date_start): gen priorhrs = /// sum(hours) - hours replace priorhrs = 0 if priorhrs>0 & /// (date_start<date_end[_n-1] & !mi(date_end[_n-1])) *** to get what you need. - Eric __ Eric A. Booth Public Policy Research Institute Texas A&M University ebooth@ppri.tamu.edu +979.845.6754 On Apr 7, 2012, at 4:12 PM, KOTa wrote: > thanks Eric, > i did adapt your code for real dataset without problems (see my > previous response) > > i am just interested how to do it using Nick's code, which works much > faster (dataset is pretty large). > i am probably missing some thing about how "sum" works or about > boundary conditions > > my rewrite of Nick was: > > bysort ID activity (date_start): gen priorhrs = sum(hours) - hours if > (date_start>=date_end[_n-1] & !mi(date_end[_n-1])) > > which does not work :/ > > regards > K. > > El día 7 de abril de 2012 14:09, Eric Booth <eric.a.booth@gmail.com> escribió: >> <> >> >> On Apr 7, 2012, at 9:42 AM, KOTa wrote: >> >>> thanks Erik, Nick >>> >>> both ways work on this sample, and i managed to adjust for a real database also. >>> >>> few followup question: >>> >>> 1. Eric, any specific reason why you convert time into date format? I >>> used it without conversion and it looks fine. >>> Also on real data your solution somehow goes into infinite loop after >>> finishing all calculations. >>> >> I don't know how you edited the code, so I don't know how you produced an infinite loop? It certainly doesn't do that with the example I gave you. Also, I created a time/date format version of "date" just out of habit, I find that it makes data management easier. If you don't need/want it, then skip it. >> >> >>> 2. Nick, the solution you proposed with "sum" was the one i tried >>> initially. The problem is that in real data I have 2 columns for date >>> (start_date end_date) and i want to compare start_date [_n] >= >>> end_date[all previous] >>> Eric's solution works longer, but i could adjust it to work with >>> different date columns. >>> your solution is works much faster in case there is one date column, >>> but i could not figure out how to do the same with 2 different >>> columns. >> >> El día 7 de abril de 2012 04:08, Nick Cox <njcoxstata@gmail.com> escribió: >>> Why not >>> bysort id activity (time): gen priorhrs = sum(time) - time >>> Nick >> >> >> Nick's -sum- solution does work (and is more straightforward than my -replace line) with a slight tweak - changing the first "time" to "date2" (if you create the time/date var from my example). Also, if you are using the same variable names from your original example you'd change "time" to "hours". Adding this to my example works: >> >> bysort ID activity (date2): gen njc = sum(hours) - hours >> >> >> Adapting mine and Nick's code to your "real" dataset is another issue completely - if you cannot figure how to adapt our tips to your data/code, then you should post a data & code example/question that is closer to reality. >> >> >> - Eric >> __ >> Eric A. Booth >> Public Policy Research Institute >> Texas A&M University >> ebooth@ppri.tamu.edu >> Office: +979.845.6754 >> >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: calculating cumulative values of other observations***From:*KOTa <kota.alba@gmail.com>

**Re: st: calculating cumulative values of other observations***From:*Eric Booth <eric.a.booth@gmail.com>

**References**:**st: calculating cumulative values of other observations***From:*KOTa <kota.alba@gmail.com>

**Re: st: calculating cumulative values of other observations***From:*Eric Booth <eric.a.booth@gmail.com>

**Re: st: calculating cumulative values of other observations***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: calculating cumulative values of other observations***From:*KOTa <kota.alba@gmail.com>

**Re: st: calculating cumulative values of other observations***From:*Eric Booth <eric.a.booth@gmail.com>

**Re: st: calculating cumulative values of other observations***From:*KOTa <kota.alba@gmail.com>

- Prev by Date:
**Re: st: calculating cumulative values of other observations** - Next by Date:
**Re: st: calculating cumulative values of other observations** - Previous by thread:
**Re: st: calculating cumulative values of other observations** - Next by thread:
**Re: st: calculating cumulative values of other observations** - Index(es):