Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Doing something an observation-specific number of times


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Doing something an observation-specific number of times
Date   Tue, 28 Aug 2012 14:11:53 -0400

robert hartman <rohartman@gmail.com>:
In your example:
v3=((1+(.41^1))/2) + ((1+(.41^2))/2) ...((1+(.41^77))/2) + ((1+(.41^78))/2)
for v1=.41 and v2=.78
the sum is v2 (all the ones) plus a geometric series
that sums to .5*.41*(1-.41^78)/(1-.41), right?
I.e.
g v3=v2+v1*(1-v1^v2)/(1-v1)/2

On Tue, Aug 28, 2012 at 2:03 PM, robert hartman <rohartman@gmail.com> wrote:
> Thanks for the pointers, Maarten and Austin.
>
> I don't believe this is a geometric series, since the ratio of
> consecutive terms is not constant. But I may just be missing it.
>
> Maarten, the data sets can get well into the tens and perhaps hundreds
> of thousands. Code like what you've provided looks promising, though
> you are probably right that there is no computational free lunch.
>
> On Tue, Aug 28, 2012 at 1:39 PM, Maarten Buis <maartenlbuis@gmail.com> wrote:
>> On Tue, Aug 28, 2012 at 6:45 PM, robert hartman wrote:
>>> Imagine that observation 1 has v1 and v2 values of .41 and 78,
>>> respectively.  <snip>  For example, for observation 1, the new obs 1 v3
>>> value=((1+(.41^1))/2) + ((1+(.41^2))/2) ...((1+(.41^77))/2) +
>>> ((1+(.41^78))/2).
>>>
>>> I have begun to think of some klugy ways of doing this via looping or
>>> even the expand command.
>>
>> Depending on the number of observations in your original dataset the
>> -expand- route may be the easiest. If the number of observations is
>> large than this strategy may be infeasible due to memory limitations.
>> When it comes to efficiency, you need to make the tradeoff between the
>> amount of time you need to write the more fancy code (and the effort
>> you will need to understand it again after some time...) against the
>> time you safe because it runs quicker. Often the balance will be
>> against the more fancy solutions(*).
>>
>> *---------------- begin example ---------------
>> // create some example data
>> clear
>> input v1 v2
>> .41 78
>> .23 50
>> end
>>
>> // we need to keep track on who is who before
>> // expanding
>> gen id = _n
>>
>> // create v2 rows per observation
>> expand v2
>>
>> // create the appropriate exponent
>> bys id : gen expo = _n
>>
>> // create the basic component of the computation
>> gen double value = (1+v1^expo)/2
>>
>> // sum() returns a running sum
>> by id : replace value = sum(value)
>>
>> // the final sum is the last of the running sum
>> bys id (expo) : replace value = value[_N]
>>
>> //get rid of things that are no longer needed
>> drop expo
>> by id : keep if _n == 1
>> drop id
>>
>> // see the result
>> list
>> *----------------- end example ----------------
>> (For more on examples I sent to the Statalist see:
>>  http://www.maartenbuis.nl/example_faq )
>>
>> Hope this helps,
>> Maarten
>>
>> (*) This of course ignores the pure joy you will get from figuring out
>> the fancy solution, but we are not payed to enjoy ourselves!
>>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index