Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: RE: st: Converting a SAS datastep to Stata

 From wgould@stata.com (William Gould, StataCorp LP) To statalist@hsphsun2.harvard.edu Subject Re: RE: st: Converting a SAS datastep to Stata Date Thu, 16 Dec 2010 09:59:27 -0600

```Concerning SAS code that he is translating to Stata, Daniel Feenberg
<feenberg@nber.org> wrote,

> Repeating the if qualifier means repeating a calculation, which is an
> inefficiency, but it also means repeating the code, which is ugly and
> distracting. That is why I asked about the possibility of a block level if
> qualifier. If it doesn't exist, I'll put it in W Gould's suggestion box.

Daniel made the above comments concerning code he is translating from SAS
to Stata.  The SAS code reads,

if FLPDYR eq 2003 then do;
_amt5pc = min(c24533,min(c24532,min(c62700,c24517)));
_amt5pc = max(0,_amt5pc);
c62747 = .05*_amt5pc;
_line49 = max(0,min(c24532,min(c24517,c62700)) - _amt5pc);
_line50 = sum(e24583,0);
_amt8pc = min(_line49,_line50);
c62749 = .08*_amt8pc;
_amt10pc = _line49 - _amt8pc;
c62750 =  .1*_amt10pc;
_line55 = c24533 - _amt5pc;
_line56 = min(c24517,c62700) - min(c24532,min(c24517,c62700));
_amt15pc = min(_line55,_line56);
c62755 =  .15*_amt15pc;
_amt20pc = _line56 - _amt15pc;
c62760 =  .2*_amt20pc;
_amt25pc = min(c62700,min(c24517+e24515,c24516))-min(c62700,c24517);
c62770 =  .25*_amt25pc;
_tamt2  = c62747 + c62749 + c62750 + c62755 + c62760 + c62770;
end;

The above is the code code for one of the years, and Daniel has a lot more
code for eacxh of the other years.

The problem is that Stata puts if qualifiers on end of lines whereas
SAS puts them out front.  In this case, the resulting SAS code is easier

Solution 1
----------

translate the code with easy-to-apply global edits:

local R = "replace"

local if = "if FLPDYR==2003"
`r' _amt5pc = min(c24533,min(c24532,min(c62700,c24517))) `if'
`r' _amt5pc = max(0,_amt5pc) `if'
`r' c62747 = .05*_amt5pc `if'
`r' _line49 = max(0,min(c24532,min(c24517,c62700)) - _amt5pc) `if'
...
`r' _tamt2  = c62747 + c62749 + c62750 + c62755 + c62760 + c62770 `if'

local if = "if FLPYDR==2004"
`r' ... `if'
...

What I did was add `r' to the front of each of the original SAS lines,
and repalce the the semicolon at the end with `if'.

reinterpretation of the `if' line by line by line), but does address
the problem with "ugly and distracting".

By the way, concerning efficiency, while I agree that reevaluating the if line
by line by line is inefficient, that does not imply that the above Stata code
runs more slowly than the original SAS code.  Stata keeps the data in memory,
and all the rest of Stata has been optimized for that. SAS reads data from
disk, and all the rest of SAS has been optimized for that.  I do not know
which package will be faster in this case.  All I know for sure is is that, as
dataset size grows, the SAS code will slow down less than will the Stata code
I just suggested.

Solution 2
----------

Let's get rid of the `if' on the end.  The solution below might be more
efficient, but I don't guarantee it.  I'm about to substitute disk I/O for
re-evaluation of the if, and thus make Stata more closely mimic how SAS
operates.  I don't guarantee this solution is faster because, as previously
stated, Stata is very fast at re-evaluating if statements, and because I will
end up substituting more I/O than SAS performs in this case executing it's
code.

I have other reasons for suggesting this solution, which reasons will become
obvious in the telling.

The solution is,

local `r' = "replace"
forvalues yr=1980(1)2008 {
save hold
keep if FLPDYR == `yr'

if `yr'==1980 {
...
}
else if `yr'==1981 {
...
}
...
else if `yr'==2003 {
`r' _amt5pc = min(c24533,min(c24532,min(c62700,c24517)))
`r' _amt5pc = max(0,_amt5pc)
`r' c62747 = .05*_amt5pc
...
}
else ...

save result, emptyok
use hold
drop if FLPDYR==`yr'
append using hold
}

Here's an even more readable version of this solution:

local `r' = "replace"
forvalues yr=1980(1)2008 {
save hold
keep if FLPDYR == `yr'
taxyear`yr'
save result, emptyok
use hold
drop if FLPDYR==`yr'
append using hold
}

Note the line -taxyear`yr'-.  If `yr'==2003, then that will execute
the subroutine -taxyear2003-.  Cute, huh?

Then I write subroutines for each of the tax years, such as

program taxyear2003
local r = `replace'
`r' _amt5pc = min(c24533,min(c24532,min(c62700,c24517)))
`r' _amt5pc = max(0,_amt5pc)
`r' c62747 = .05*_amt5pc
...
end

perhaps even more readable than the original SAS code -- and it does not
require changing the original SAS code much.

Other solutions
---------------

Daniel could use Mata.  That would address both the readability and efficiency
issues.  If I were writing this code for the first time, that is what I would
do, probably.  With Mata, I can go through the observations one at a time just
as SAS does.

But if I had code already written in SAS, I would use solution 2, version 2.
The changes required by that solution are minimal and I will spend less
time debugging and convincing myself that I had the same answers as
previously, than if I started all over again.

-- Bill
wgould@stat.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```