Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Doing something an observation-specific number of times


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Doing something an observation-specific number of times
Date   Tue, 28 Aug 2012 14:12:57 -0400

robert hartman <rohartman@gmail.com>:
Except I notice now you are dividing the ones by 2 as well, so
g v3=v2/2+v1*(1-v1^v2)/(1-v1)/2

On Tue, Aug 28, 2012 at 2:11 PM, Austin Nichols <austinnichols@gmail.com> wrote:
> robert hartman <rohartman@gmail.com>:
> In your example:
> v3=((1+(.41^1))/2) + ((1+(.41^2))/2) ...((1+(.41^77))/2) + ((1+(.41^78))/2)
> for v1=.41 and v2=.78
> the sum is v2 (all the ones) plus a geometric series
> that sums to .5*.41*(1-.41^78)/(1-.41), right?
> I.e.
> g v3=v2+v1*(1-v1^v2)/(1-v1)/2
>
> On Tue, Aug 28, 2012 at 2:03 PM, robert hartman <rohartman@gmail.com> wrote:
>> Thanks for the pointers, Maarten and Austin.
>>
>> I don't believe this is a geometric series, since the ratio of
>> consecutive terms is not constant. But I may just be missing it.
>>
>> Maarten, the data sets can get well into the tens and perhaps hundreds
>> of thousands. Code like what you've provided looks promising, though
>> you are probably right that there is no computational free lunch.
>>
>> On Tue, Aug 28, 2012 at 1:39 PM, Maarten Buis <maartenlbuis@gmail.com> wrote:
>>> On Tue, Aug 28, 2012 at 6:45 PM, robert hartman wrote:
>>>> Imagine that observation 1 has v1 and v2 values of .41 and 78,
>>>> respectively.  <snip>  For example, for observation 1, the new obs 1 v3
>>>> value=((1+(.41^1))/2) + ((1+(.41^2))/2) ...((1+(.41^77))/2) +
>>>> ((1+(.41^78))/2).
>>>>
>>>> I have begun to think of some klugy ways of doing this via looping or
>>>> even the expand command.
>>>
>>> Depending on the number of observations in your original dataset the
>>> -expand- route may be the easiest. If the number of observations is
>>> large than this strategy may be infeasible due to memory limitations.
>>> When it comes to efficiency, you need to make the tradeoff between the
>>> amount of time you need to write the more fancy code (and the effort
>>> you will need to understand it again after some time...) against the
>>> time you safe because it runs quicker. Often the balance will be
>>> against the more fancy solutions(*).
>>>
>>> *---------------- begin example ---------------
>>> // create some example data
>>> clear
>>> input v1 v2
>>> .41 78
>>> .23 50
>>> end
>>>
>>> // we need to keep track on who is who before
>>> // expanding
>>> gen id = _n
>>>
>>> // create v2 rows per observation
>>> expand v2
>>>
>>> // create the appropriate exponent
>>> bys id : gen expo = _n
>>>
>>> // create the basic component of the computation
>>> gen double value = (1+v1^expo)/2
>>>
>>> // sum() returns a running sum
>>> by id : replace value = sum(value)
>>>
>>> // the final sum is the last of the running sum
>>> bys id (expo) : replace value = value[_N]
>>>
>>> //get rid of things that are no longer needed
>>> drop expo
>>> by id : keep if _n == 1
>>> drop id
>>>
>>> // see the result
>>> list
>>> *----------------- end example ----------------
>>> (For more on examples I sent to the Statalist see:
>>>  http://www.maartenbuis.nl/example_faq )
>>>
>>> Hope this helps,
>>> Maarten
>>>
>>> (*) This of course ignores the pure joy you will get from figuring out
>>> the fancy solution, but we are not payed to enjoy ourselves!
>>>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index