Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: loop question


From   Sandu Cojocaru <scojocaru@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: loop question
Date   Tue, 3 Nov 2009 12:50:33 -0500

Bill,

Thank you very much! I haven't made the transition to mata yet, but
your solution is yet another reminder that this transition is long
overdue!

cheers,
sandu

On Tue, Nov 3, 2009 at 10:59 AM, William Gould, StataCorp LP
<wgould@stata.com> wrote:
> Sandu Cojocaru <scojocaru@gmail.com> asked,
>
>> I'm having trouble generating a variable that for each member i equals
>> sum(Cj-Ci) over all Cj>Ci where i and j are members of the same group.
>> Here's an example of the data setup - I'm trying to calculate
>> `outcome_var'.
>> For row 1 outcome_var=0, for row 3 = (200-100)+(300-100) = 300...and so on...
>>
>>        group_id       member_id         C        outcome_var
>>               1               1       300                  0
>>               1               2       200                100
>>               1               3       100                300
>>               2               1       150                 50
>>               2               2       200                  0
>>               2               3       100                150
>>               2               4        50                300
>>               3               1     and so on...
>
> This question has already been answered elegantly by Martin Weiss
> <martin.weiss1@gmx.de>.  His answer was,
>
>>       clear*
>>
>>       input byte(group_id member_id) C
>>       1 1  300
>>       1 2  200
>>       1 3  100
>>       2 1  150
>>       2 2  200
>>       2 3  100
>>       2 4  50
>>       end
>>
>>       compress
>>       list, noo sepby(group_id)
>>
>>       bys group_id (C):  /*
>>       */ gen diff=C[_n+1]-C[_n]
>>       bys group_id: gen num=_N-_n
>>       bys  group_id (num): /*
>>       */ gen outcome_var=sum(diff*num)
>>       sort group_id member_id
>>
>>       drop diff num
>>       list, noo sepby(group_id)
>
> I'm about to give a different answer.  Sometimes one needs to create a
> variable that is a complicated combination of values in different
> observations.  There is always a way to do it in Stata, but somtimes
> the solution is elusive and one wished one could just loop across
> the observations and make the calculation directly even if that solution
> was inefficient.  I want to show how to do that using Mata.
> The basic recipe is
>
>    1.  Enter Mata:
>
>                . mata
>
>                : _
>
>
>    2.  Create individual Mata variables that are a view onto each of
>        relevant Stata variables.  In the above, the relevant Stata variables
>        are group_id and and C, so create Mata variables of the same
>        name:
>
>               : st_view(group_id=.,    ., "group_id")
>               : st_view(C,             ., "C")
>               : _
>
>    3.  Go back to Stata and create the the desired new variable, filled
>        with missing values.  Create a view onto that, too.  In this
>        example, the new desired variable is outcome_var:
>
>               : end
>               . gen outcome_var = .
>               . mata
>               : st_view(outcome_var=., ., "outcome_var")
>
>    4. Loop in Mata to fill in the new variable.
>
> Before showing the solution to Sandu's problem, let me show how
> this works in an easier examples.
>
>
> An easy example
> ----------------
>
> We want to create new variable newx equal to x+1.  We could do this
> in Stata by typing
>
>        . gen newx = x + 1
>
> Alternatively, we could achieve the same result by typing,
>
>        . gen newx = .
>
>        . mata
>
>        : st_view(x=.,    ., "x")
>        : st_view(newx=., ., "newx")
>
>        : for (i=1; i<=st_nobs(); i++) {
>        :        newx[i] = x[i] + 1
>        : }
>
>        : end
>
> Try it.  The result after typing all that Mata code will be the same
> as -gen newx = x + 1-.
>
> Note the use of Mata function st_nobs() to obtain the number of
> observations in the dataset.
>
>
> Panel data (by)
> ---------------
>
> Panel data adds complication.  Pretend we wanted to code the Mata
> equivalent to
>
>        . by group:  gen newx = x + 1
>
> I know the -by group:- prefix adds nothing to the statement, but
> at this point I want to keep the example simple.
>
> The equivalent Mata code is,
>
>        . gen newx = .
>
>        . mata
>
>        : st_view(group=., ., "group")
>        : st_view(x=.,     ., "x")
>        : st_view(newx=.,  ., "newx")
>
>        : obs = panelsetup(group, 1)
>
>        : for (g=1; g<=rows(obs); g++) {
>        :        for (i=obs[g,1]; i<=obs[g,2]; g++) {
>        :                newx[i] = x[i] + 1
>        :        }
>        : }
>
>        : end
>
> In the above code, I assume the data are already sorted by group.
>
> Note the line
>
>        : obs = panelsetup(group, 1)
>
> If we had two groups -- it wouldn't matter if they were numbered 1 and 2
> or 6*_pi and 9 -- and we had three observations in the first group
> and five in the second, matrix obs would contain
>
>        1  3
>        4  8
>
> The first row states the observation numbers corresponding the first
> group (1 to 3); the second grow states the observation numbers
> corresponding to the second (4 through 8).  The matrix has two rows
> because there are two groups.  The matrix always has 2 columns.
> See -help mata panelsetup()-.
>
> In the loop that follows, the outer loop (g) loops across the by
> groups.  The inner loop (i) loops across the observations within
> the group.
>
>
> Putting it all together; the solution to Sandu's problem
> --------------------------------------------------------
>
> Here is the solution to Sandu's problem:
>
>        . sort group_id
>        . gen outcome_var = .
>
>        : mata:
>
>        : st_view(group_id=.,     ., "group_id")
>        : st_view(C=.,            ., "C")
>        : st_view(outcome_var=.,  ., "outcome_var")
>
>        : obs = panelsetup(group_id, 1)
>
>        : for (g=1; g<=rows(obs); g++) {
>        :         for (i=obs[g,1]; i<=obs[g,2]; i++) {
>        :                 sum = 0
>        :                 for (j=obs[g,1]; j<=obs[g,2]; j++) {
>        :                         if (C[j]>C[i]) sum = sum + (C[j]-C[i])
>        :                 }
>        :                 outcome_var[i] = sum
>        :         }
>        : }
>
>        : end
>
> Note line the line
>
>                   if (C[j]>C[i]) sum = sum + (C[j]-C[i])
>
> That line is coded almost exactly as Sandu stated the problem:
> He requested the sum(Cj-Ci) over all Cj>Ci where i and j are members
> of the same group.
>
> In the code above, the outer loop (g) loops over group_id.  The next
> loop (i) loops over the members of the group.  The inner loop (j)
> also loops over the members of the group so that we obtain all
> combinations of i and j.
>
> Martin's solution executes more quickly than the above solution.  I
> tried both solutions on 5,000 groups, each with 100 members.  Martin's
> solution ran in 2.28 seconds.  Mine took 35 seconds!  That's not so
> much Mata's fault as mine.  My solution is not cleaver; I performed
> the -if (C[j]>C[i]) sum = sum + (C[j]-C[i])- statement 50,000,000 times!
>
> So what?  My solution was not clever and neither did it depend on me
> being clever.  I wonder which one of us had a solution to this problem
> sooner?  I just plugged into the recipe:
>
>    1.  Enter Mata.
>
>    2.  Create individual Mata variables that are a view onto each of
>        relevant Stata variables.
>
>    3.  Go back to Stata and create the the desired new variable, filled
>        with missing values.  Create a view onto that, too.
>
>    4. Loop in Mata to fill in the new variable.
>
> The only new code I wrote was for (4), and that read
>
>        : for (g=1; g<=rows(obs); g++) {
>        :         for (i=obs[g,1]; i<=obs[g,2]; i++) {
>        :                 sum = 0
>        :                 for (j=obs[g,1]; j<=obs[g,2]; j++) {
>        :                         if (C[j]>C[i]) sum = sum + (C[j]-C[i])
>        :                 }
>        :                 outcome_var[i] = sum
>        :         }
>        : }
>
> -- Bill
> wgould@stata.com
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index