Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: RE: st: Converting a SAS datastep to Stata |

Date |
Fri, 17 Dec 2010 13:32:24 -0500 |

Bill, Dan, et al.: I initially thought of panelsetup() as well--and I definitely think some of the processing should be done in Mata. But I think the separate programs for separate years should be organized differently--you define Mata functions for each year, and then a single program invokes the appropriate Mata function for each year and restricts the data as well. The only issue is translating the SAS code into Mata, but this is almost trivial--the code in each if block goes into a separate Mata function and there are only a few translations to take care of. Where you would code _amt5pc = min(c24533,min(c24532,min(c62700,c24517))); you now code _amt5pc = rowmin((c24533,rowmin((c24532,rowmin((c62700,c24517)))))); that is, code rowmin((a,b)) for min(a,b). Global search and replace to double all parentheses. Where you would code _amt5pc = max(0,_amt5pc); you now code _amt5pc = rowmax((J(rows(_amt5pc),1,0),_amt5pc)); that is, make a vector of zeros for any 0. Probably easier to just define z to be a vector of zeros up front and replace all 0's with z's. The semicolons are optional. Cut and paste this whole example into the Command window: *set up some fake data for example clear all sysuse auto keep in 1/10 ren price c24533 ren mpg c24532 ren turn c62700 ren trunk c24517 ren headroom e24583 ren weight e24515 ren length c24516 g yr=2000+floor(_n/3) g id=_n sort yr id compress list id yr c*, sepby(yr) noo *now run example--note how everything in { } is very close to original SAS code mata: mata set matastrict off void FLPDYR2003() { external c24533,c24532,c24517,c24516,c62700,e24583,e24515 external id,_amt5pc,_amt8pc,_amt10pc,_amt20pc,_amt25pc _amt5pc = rowmin((c24533,rowmin((c24532,rowmin((c62700,c24517)))))); z=J(rows(_amt5pc),1,0); _amt5pc = rowmax((z,_amt5pc)); c62747 = .05*_amt5pc; _line49 = rowmax((z,rowmin((c24532,rowmin((c24517,c62700))))-_amt5pc)); _line50 = rowsum(e24583,0); _amt8pc = rowmin((_line49,_line50)); c62749 = .08*_amt8pc; _amt10pc = _line49 - _amt8pc; c62750 = .1*_amt10pc; _line55 = c24533 - _amt5pc; _line56 = rowmin((c24517,c62700)) - rowmin((c24532,rowmin((c24517,c62700)))); _amt15pc = rowmin((_line55,_line56)); c62755 = .15*_amt15pc; _amt20pc = _line56 - _amt15pc; c62760 = .2*_amt20pc; _amt25pc = rowmin((c62700,rowmin((c24517+e24515,c24516))))-rowmin((c62700,c24517)); c62770 = .25*_amt25pc; _tamt2 = c62747 + c62749 + c62750 + c62755 + c62760 + c62770; } end prog FLPDYR syntax anything [if] conf num `anything' if !inrange(`anything',2000,2009) { di as err "Year out of range" error 198 } putmata id c* e* `if', view replace mata: FLPDYR`anything'() getmata id _amt5pc _amt8pc _amt10pc _amt20pc _amt25pc, update id(id) end FLPDYR 2003 if yr==2003 list yr c* _amt*, sepby(yr) noo The Mata could be a little less sloppy but the year as argument and if qualifier separately in the program is intentional--I can see where you might want to use data from 2003 but tax law from 2002 or what have you. On Thu, Dec 16, 2010 at 5:10 PM, William Gould, StataCorp LP <wgould@stata.com> wrote: > I wrote, > > WG> [...]that is what I would do, probably. With Mata, I can go > WG> through the observations one at a time just as SAS does. > > Daniel Feenberg <feenberg@nber.org> replied, > > DF> Do you mean a "for" loop over observations? > DF> [...] > DF> Wouldn't that structure be subject to the complaint you voiced > DF> about explicitly looping over observations? [...] If that > DF> doesn't apply to Mata (perhaps because Mata is pseudo-compiled) > DF> it would be very attractive. > > The stricture does not apply to Mata. More correctly, I never > recommend explicitly looping over observations if you can avoid > it, and that applies to Mata, and that applies to language other > than Stata and Mata, too, if the language provides an alternative > method. > > In the case of Mata, it is faster than Stata, and explicitly looping over > the observations often produces acceptable performance. > > If you were going to use Mata and explictly loop over observations, > I would recommend against using views. > > In this case, however, I can think of a way to write the procedure > without looping over the data: > > 1. Put the data in year order, so all 1973 are together, all 1974 > are together, etc. Do that in Stata. > > 2. In Mata, construct a view onto the data. > > 3. Use function [M-5] panelsetup() to obtain the beginning and > ending indices of each year. > > 4. For each value of year, > > a. Extract from view matrix submatrix for the year using > range subscripts [|#,# \ #,#|]; see [M-2] subscripts. > Store the result in a regular matrix. > > b. Pass said matrix to the year-specific Mata subroutine you > write to make the calculation. > > c. In the year-specific subroutine, do not loop through the > observations; instead use the appropriate colon operators; > see [M-2] op_colon. > > 5. Now slam in one swoop the newly replaced values of variables > back into the View using the same range subscripts [|#,#\#,#|] > you used when extracting the the submatrix. This time, the > range subscripts will appear to the left of the equal-sign > assignment operator. > > There are other approaches you could use, but what I outlined would > be very fast. > > All of that said, you may very well get adequate performance using Mata > and looping over the observations. It is not that what I just suggested > would take longer to code than the explicit looping solution, it is merely > that it assumes more familiarity with Mata and its advanced features. > When breaking into Mata for the first time, it is usually best to stay > with approaches with which you are familiar. One of the good features > about Stata is that those approaches usually work well. > > > -- Bill > wgould@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: RE: st: Converting a SAS datastep to Stata***From:*wgould@stata.com (William Gould, StataCorp LP)

- Prev by Date:
**Re: st: generating age using dates** - Next by Date:
**st: S/N: 50110517252** - Previous by thread:
**Re: RE: st: Converting a SAS datastep to Stata** - Next by thread:
**st: can -dfl- be directly used after revising a small bug (a program name _genx)?** - Index(es):