Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: looping over observations -mata-?


From   ariley@stata.com (Alan Riley)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: looping over observations -mata-?
Date   Wed, 12 Oct 2005 11:17:11 -0500

Ricardo Ovaldia (ovaldia@yahoo.com) wonders if Mata will help him
solve the following problem:
> I have household data with one observation per family
> member. All House hold have one or both parents and
> anywhere from 1 to seven children. All households have
> children but no grandparents or other relatives. Here
> are a few tipical observations and relevant variables:
> 
> . cl  familyid subjid relation
> 
>      familyid    subjid   relation
>   1.     1001         1          f
>   2.     1001         2          m
>   3.     1001         3          c
>   4.     1001         4          c
>   5.     1002         1          m
>   6.     1002         2          c
>   7.     1002         3          c
>   8.     1003         1          m
>   9.     1003         2          f
>  10.     1003         3          c
> 
> where for -relation-: f=father, m=mother and c=child
> 
> I want to create two new variables which hold, for the
> children, their parent's -subjectid-  as follows:
> 
>      familyid    subjid   relation  fatherid  motherid
>   1.     1001         1          f         .         .
>   2.     1001         2          m         .         .
>   3.     1001         3          c         1         2
>   4.     1001         4          c         1         2
>   5.     1002         1          m         .         .
>   6.     1002         2          c         .         2
>   7.     1002         3          c         .         2
>   8.     1003         1          m         .         .
>   9.     1003         2          f         .         .
>  10.     1003         3          c         2         1
> 
> I wrote a program to do this but is very slow because
> it loops over observations. 
> I think that if I recode this using -mata- it would be
> faster, but I not sure where to begin. Any assistance
> or suggestions will be greatly appreciated.

Ricardo is correct that if he rewrites his loop in Mata, it
will be faster.  However, this is still not the optimal solution.
Mata is useful for many data management tasks, such as reading
and manipulating files and performing left-hand-side indexing (i.e.
when you want to achieve something like

    . generate y[somevar] = x

which isn't possible in Stata but is possible using Mata and matrix views
onto the Stata dataset in Mata).

However, Ricardo can achieve his results with just a few Stata commands
and creative sorting:


    generate fatherid = subjid if relation=="f"
    sort familyid fatherid
    by familyid: replace fatherid = fatherid[1] if relation=="c"
    replace fatherid = . if relation != "c"

    generate motherid = subjid if relation=="m"
    sort familyid motherid
    by familyid: replace motherid = motherid[1] if relation=="c"
    replace motherid = . if relation != "c"

    sort familyid subjid
    list

Note that the last -list- shows different values for motherid
in observations 6 and 7 from what Ricardo showed in his example.
However, I believe that motherid should be '1' in those two
observations as produced by the code above.


Alan
(ariley@stata.com)
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index