Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RE: st: Converting a SAS datastep to Stata


From   wgould@stata.com (William Gould, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: RE: st: Converting a SAS datastep to Stata
Date   Thu, 16 Dec 2010 16:10:36 -0600

I wrote, 

WG> [...]that is what I would do, probably.  With Mata, I can go 
WG> through the observations one at a time just as SAS does.

Daniel Feenberg <feenberg@nber.org> replied, 

DF> Do you mean a "for" loop over observations?
DF> [...]
DF> Wouldn't that structure be subject to the complaint you voiced
DF> about explicitly looping over observations? [...] If that 
DF> doesn't apply to Mata (perhaps because Mata is pseudo-compiled)
DF> it would be very attractive.

The stricture does not apply to Mata. More correctly, I never 
recommend explicitly looping over observations if you can avoid 
it, and that applies to Mata, and that applies to language other 
than Stata and Mata, too, if the language provides an alternative 
method. 

In the case of Mata, it is faster than Stata, and explicitly looping over
the observations often produces acceptable performance.

If you were going to use Mata and explictly loop over observations, 
I would recommend against using views.

In this case, however, I can think of a way to write the procedure 
without looping over the data:

    1.  Put the data in year order, so all 1973 are together, all 1974
        are together, etc.  Do that in Stata.

    2.  In Mata, construct a view onto the data.

    3.  Use function [M-5] panelsetup() to obtain the beginning and 
        ending indices of each year.

    4.  For each value of year,

            a.  Extract from view matrix submatrix for the year using 
                range subscripts [|#,# \ #,#|]; see [M-2] subscripts.
                Store the result in a regular matrix.

            b.  Pass said matrix to the year-specific Mata subroutine you 
                write to make the calculation.

            c.  In the year-specific subroutine, do not loop through the 
                observations; instead use the appropriate colon operators; 
                see [M-2] op_colon.

    5.  Now slam in one swoop the newly replaced values of variables 
        back into the View using the same range subscripts [|#,#\#,#|]
        you used when extracting the the submatrix.  This time, the 
        range subscripts will appear to the left of the equal-sign 
        assignment operator.

There are other approaches you could use, but what I outlined would 
be very fast.  

All of that said, you may very well get adequate performance using Mata 
and looping over the observations.  It is not that what I just suggested 
would take longer to code than the explicit looping solution, it is merely 
that it assumes more familiarity with Mata and its advanced features.
When breaking into Mata for the first time, it is usually best to stay 
with approaches with which you are familiar.  One of the good features 
about Stata is that those approaches usually work well.


-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index