Re: st: looping over observations -mata-?

 From ariley@stata.com (Alan Riley) To statalist@hsphsun2.harvard.edu Subject Re: st: looping over observations -mata-? Date Wed, 12 Oct 2005 11:17:11 -0500

```Ricardo Ovaldia (ovaldia@yahoo.com) wonders if Mata will help him
solve the following problem:
> I have household data with one observation per family
> member. All House hold have one or both parents and
> anywhere from 1 to seven children. All households have
> children but no grandparents or other relatives. Here
> are a few tipical observations and relevant variables:
>
> . cl  familyid subjid relation
>
>      familyid    subjid   relation
>   1.     1001         1          f
>   2.     1001         2          m
>   3.     1001         3          c
>   4.     1001         4          c
>   5.     1002         1          m
>   6.     1002         2          c
>   7.     1002         3          c
>   8.     1003         1          m
>   9.     1003         2          f
>  10.     1003         3          c
>
> where for -relation-: f=father, m=mother and c=child
>
> I want to create two new variables which hold, for the
> children, their parent's -subjectid-  as follows:
>
>      familyid    subjid   relation  fatherid  motherid
>   1.     1001         1          f         .         .
>   2.     1001         2          m         .         .
>   3.     1001         3          c         1         2
>   4.     1001         4          c         1         2
>   5.     1002         1          m         .         .
>   6.     1002         2          c         .         2
>   7.     1002         3          c         .         2
>   8.     1003         1          m         .         .
>   9.     1003         2          f         .         .
>  10.     1003         3          c         2         1
>
> I wrote a program to do this but is very slow because
> it loops over observations.
> I think that if I recode this using -mata- it would be
> faster, but I not sure where to begin. Any assistance
> or suggestions will be greatly appreciated.

Ricardo is correct that if he rewrites his loop in Mata, it
will be faster.  However, this is still not the optimal solution.
and manipulating files and performing left-hand-side indexing (i.e.
when you want to achieve something like

. generate y[somevar] = x

which isn't possible in Stata but is possible using Mata and matrix views
onto the Stata dataset in Mata).

However, Ricardo can achieve his results with just a few Stata commands
and creative sorting:

generate fatherid = subjid if relation=="f"
sort familyid fatherid
by familyid: replace fatherid = fatherid[1] if relation=="c"
replace fatherid = . if relation != "c"

generate motherid = subjid if relation=="m"
sort familyid motherid
by familyid: replace motherid = motherid[1] if relation=="c"
replace motherid = . if relation != "c"

sort familyid subjid
list

Note that the last -list- shows different values for motherid
in observations 6 and 7 from what Ricardo showed in his example.
However, I believe that motherid should be '1' in those two
observations as produced by the code above.

Alan
(ariley@stata.com)
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```