Austin Nichols <austinnichols@gmail.com>

statalist@hsphsun2.harvard.edu

Re: st: Doing something an observation-specific number of times

Tue, 28 Aug 2012 14:12:57 -0400

robert hartman <rohartman@gmail.com>: Except I notice now you are dividing the ones by 2 as well, so g v3=v2/2+v1*(1-v1^v2)/(1-v1)/2 On Tue, Aug 28, 2012 at 2:11 PM, Austin Nichols <austinnichols@gmail.com> wrote: > robert hartman <rohartman@gmail.com>: > In your example: > v3=((1+(.41^1))/2) + ((1+(.41^2))/2) ...((1+(.41^77))/2) + ((1+(.41^78))/2) > for v1=.41 and v2=.78 > the sum is v2 (all the ones) plus a geometric series > that sums to .5*.41*(1-.41^78)/(1-.41), right? > I.e. > g v3=v2+v1*(1-v1^v2)/(1-v1)/2 > > On Tue, Aug 28, 2012 at 2:03 PM, robert hartman <rohartman@gmail.com> wrote: >> Thanks for the pointers, Maarten and Austin. >> >> I don't believe this is a geometric series, since the ratio of >> consecutive terms is not constant. But I may just be missing it. >> >> Maarten, the data sets can get well into the tens and perhaps hundreds >> of thousands. Code like what you've provided looks promising, though >> you are probably right that there is no computational free lunch. >> >> On Tue, Aug 28, 2012 at 1:39 PM, Maarten Buis <maartenlbuis@gmail.com> wrote: >>> On Tue, Aug 28, 2012 at 6:45 PM, robert hartman wrote: >>>> Imagine that observation 1 has v1 and v2 values of .41 and 78, >>>> respectively. <snip> For example, for observation 1, the new obs 1 v3 >>>> value=((1+(.41^1))/2) + ((1+(.41^2))/2) ...((1+(.41^77))/2) + >>>> ((1+(.41^78))/2). >>>> >>>> I have begun to think of some klugy ways of doing this via looping or >>>> even the expand command. >>> >>> Depending on the number of observations in your original dataset the >>> -expand- route may be the easiest. If the number of observations is >>> large than this strategy may be infeasible due to memory limitations. >>> When it comes to efficiency, you need to make the tradeoff between the >>> amount of time you need to write the more fancy code (and the effort >>> you will need to understand it again after some time...) against the >>> time you safe because it runs quicker. Often the balance will be >>> against the more fancy solutions(*). >>> >>> *---------------- begin example --------------- >>> // create some example data >>> clear >>> input v1 v2 >>> .41 78 >>> .23 50 >>> end >>> >>> // we need to keep track on who is who before >>> // expanding >>> gen id = _n >>> >>> // create v2 rows per observation >>> expand v2 >>> >>> // create the appropriate exponent >>> bys id : gen expo = _n >>> >>> // create the basic component of the computation >>> gen double value = (1+v1^expo)/2 >>> >>> // sum() returns a running sum >>> by id : replace value = sum(value) >>> >>> // the final sum is the last of the running sum >>> bys id (expo) : replace value = value[_N] >>> >>> //get rid of things that are no longer needed >>> drop expo >>> by id : keep if _n == 1 >>> drop id >>> >>> // see the result >>> list >>> *----------------- end example ---------------- >>> (For more on examples I sent to the Statalist see: >>> http://www.maartenbuis.nl/example_faq ) >>> >>> Hope this helps, >>> Maarten >>> >>> (*) This of course ignores the pure joy you will get from figuring out >>> the fancy solution, but we are not payed to enjoy ourselves! >>> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

