[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: loop question |

Date |
Tue, 03 Nov 2009 09:59:13 -0600 |

Sandu Cojocaru <scojocaru@gmail.com> asked, > I'm having trouble generating a variable that for each member i equals > sum(Cj-Ci) over all Cj>Ci where i and j are members of the same group. > Here's an example of the data setup - I'm trying to calculate > `outcome_var'. > For row 1 outcome_var=0, for row 3 = (200-100)+(300-100) = 300...and so on... > > group_id member_id C outcome_var > 1 1 300 0 > 1 2 200 100 > 1 3 100 300 > 2 1 150 50 > 2 2 200 0 > 2 3 100 150 > 2 4 50 300 > 3 1 and so on... This question has already been answered elegantly by Martin Weiss <martin.weiss1@gmx.de>. His answer was, > clear* > > input byte(group_id member_id) C > 1 1 300 > 1 2 200 > 1 3 100 > 2 1 150 > 2 2 200 > 2 3 100 > 2 4 50 > end > > compress > list, noo sepby(group_id) > > bys group_id (C): /* > */ gen diff=C[_n+1]-C[_n] > bys group_id: gen num=_N-_n > bys group_id (num): /* > */ gen outcome_var=sum(diff*num) > sort group_id member_id > > drop diff num > list, noo sepby(group_id) I'm about to give a different answer. Sometimes one needs to create a variable that is a complicated combination of values in different observations. There is always a way to do it in Stata, but somtimes the solution is elusive and one wished one could just loop across the observations and make the calculation directly even if that solution was inefficient. I want to show how to do that using Mata. The basic recipe is 1. Enter Mata: . mata : _ 2. Create individual Mata variables that are a view onto each of relevant Stata variables. In the above, the relevant Stata variables are group_id and and C, so create Mata variables of the same name: : st_view(group_id=., ., "group_id") : st_view(C, ., "C") : _ 3. Go back to Stata and create the the desired new variable, filled with missing values. Create a view onto that, too. In this example, the new desired variable is outcome_var: : end . gen outcome_var = . . mata : st_view(outcome_var=., ., "outcome_var") 4. Loop in Mata to fill in the new variable. Before showing the solution to Sandu's problem, let me show how this works in an easier examples. An easy example ---------------- We want to create new variable newx equal to x+1. We could do this in Stata by typing . gen newx = x + 1 Alternatively, we could achieve the same result by typing, . gen newx = . . mata : st_view(x=., ., "x") : st_view(newx=., ., "newx") : for (i=1; i<=st_nobs(); i++) { : newx[i] = x[i] + 1 : } : end Try it. The result after typing all that Mata code will be the same as -gen newx = x + 1-. Note the use of Mata function st_nobs() to obtain the number of observations in the dataset. Panel data (by) --------------- Panel data adds complication. Pretend we wanted to code the Mata equivalent to . by group: gen newx = x + 1 I know the -by group:- prefix adds nothing to the statement, but at this point I want to keep the example simple. The equivalent Mata code is, . gen newx = . . mata : st_view(group=., ., "group") : st_view(x=., ., "x") : st_view(newx=., ., "newx") : obs = panelsetup(group, 1) : for (g=1; g<=rows(obs); g++) { : for (i=obs[g,1]; i<=obs[g,2]; g++) { : newx[i] = x[i] + 1 : } : } : end In the above code, I assume the data are already sorted by group. Note the line : obs = panelsetup(group, 1) If we had two groups -- it wouldn't matter if they were numbered 1 and 2 or 6*_pi and 9 -- and we had three observations in the first group and five in the second, matrix obs would contain 1 3 4 8 The first row states the observation numbers corresponding the first group (1 to 3); the second grow states the observation numbers corresponding to the second (4 through 8). The matrix has two rows because there are two groups. The matrix always has 2 columns. See -help mata panelsetup()-. In the loop that follows, the outer loop (g) loops across the by groups. The inner loop (i) loops across the observations within the group. Putting it all together; the solution to Sandu's problem -------------------------------------------------------- Here is the solution to Sandu's problem: . sort group_id . gen outcome_var = . : mata: : st_view(group_id=., ., "group_id") : st_view(C=., ., "C") : st_view(outcome_var=., ., "outcome_var") : obs = panelsetup(group_id, 1) : for (g=1; g<=rows(obs); g++) { : for (i=obs[g,1]; i<=obs[g,2]; i++) { : sum = 0 : for (j=obs[g,1]; j<=obs[g,2]; j++) { : if (C[j]>C[i]) sum = sum + (C[j]-C[i]) : } : outcome_var[i] = sum : } : } : end Note line the line if (C[j]>C[i]) sum = sum + (C[j]-C[i]) That line is coded almost exactly as Sandu stated the problem: He requested the sum(Cj-Ci) over all Cj>Ci where i and j are members of the same group. In the code above, the outer loop (g) loops over group_id. The next loop (i) loops over the members of the group. The inner loop (j) also loops over the members of the group so that we obtain all combinations of i and j. Martin's solution executes more quickly than the above solution. I tried both solutions on 5,000 groups, each with 100 members. Martin's solution ran in 2.28 seconds. Mine took 35 seconds! That's not so much Mata's fault as mine. My solution is not cleaver; I performed the -if (C[j]>C[i]) sum = sum + (C[j]-C[i])- statement 50,000,000 times! So what? My solution was not clever and neither did it depend on me being clever. I wonder which one of us had a solution to this problem sooner? I just plugged into the recipe: 1. Enter Mata. 2. Create individual Mata variables that are a view onto each of relevant Stata variables. 3. Go back to Stata and create the the desired new variable, filled with missing values. Create a view onto that, too. 4. Loop in Mata to fill in the new variable. The only new code I wrote was for (4), and that read : for (g=1; g<=rows(obs); g++) { : for (i=obs[g,1]; i<=obs[g,2]; i++) { : sum = 0 : for (j=obs[g,1]; j<=obs[g,2]; j++) { : if (C[j]>C[i]) sum = sum + (C[j]-C[i]) : } : outcome_var[i] = sum : } : } -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: loop question***From:*Sandu Cojocaru <scojocaru@gmail.com>

- Prev by Date:
**st: AW: AW: AW: AW: AW: Error in egen rank(), unique?** - Next by Date:
**st: parse last command from ado file?** - Previous by thread:
**AW: st: AW: loop question** - Next by thread:
**Re: st: loop question** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |