Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: RE: st: Converting a SAS datastep to Stata |

Date |
Thu, 16 Dec 2010 09:59:27 -0600 |

Concerning SAS code that he is translating to Stata, Daniel Feenberg <feenberg@nber.org> wrote, > Repeating the if qualifier means repeating a calculation, which is an > inefficiency, but it also means repeating the code, which is ugly and > distracting. That is why I asked about the possibility of a block level if > qualifier. If it doesn't exist, I'll put it in W Gould's suggestion box. Daniel made the above comments concerning code he is translating from SAS to Stata. The SAS code reads, if FLPDYR eq 2003 then do; _amt5pc = min(c24533,min(c24532,min(c62700,c24517))); _amt5pc = max(0,_amt5pc); c62747 = .05*_amt5pc; _line49 = max(0,min(c24532,min(c24517,c62700)) - _amt5pc); _line50 = sum(e24583,0); _amt8pc = min(_line49,_line50); c62749 = .08*_amt8pc; _amt10pc = _line49 - _amt8pc; c62750 = .1*_amt10pc; _line55 = c24533 - _amt5pc; _line56 = min(c24517,c62700) - min(c24532,min(c24517,c62700)); _amt15pc = min(_line55,_line56); c62755 = .15*_amt15pc; _amt20pc = _line56 - _amt15pc; c62760 = .2*_amt20pc; _amt25pc = min(c62700,min(c24517+e24515,c24516))-min(c62700,c24517); c62770 = .25*_amt25pc; _tamt2 = c62747 + c62749 + c62750 + c62755 + c62760 + c62770; end; The above is the code code for one of the years, and Daniel has a lot more code for eacxh of the other years. The problem is that Stata puts if qualifiers on end of lines whereas SAS puts them out front. In this case, the resulting SAS code is easier to read, and to write. Solution 1 ---------- My first solution addresses the readability issue and allows Daniel to translate the code with easy-to-apply global edits: local R = "replace" local if = "if FLPDYR==2003" `r' _amt5pc = min(c24533,min(c24532,min(c62700,c24517))) `if' `r' _amt5pc = max(0,_amt5pc) `if' `r' c62747 = .05*_amt5pc `if' `r' _line49 = max(0,min(c24532,min(c24517,c62700)) - _amt5pc) `if' ... `r' _tamt2 = c62747 + c62749 + c62750 + c62755 + c62760 + c62770 `if' local if = "if FLPYDR==2004" `r' ... `if' ... What I did was add `r' to the front of each of the original SAS lines, and repalce the the semicolon at the end with `if'. This solution does not address Daniel's comment about code efficiency (the reinterpretation of the `if' line by line by line), but does address the problem with "ugly and distracting". By the way, concerning efficiency, while I agree that reevaluating the if line by line by line is inefficient, that does not imply that the above Stata code runs more slowly than the original SAS code. Stata keeps the data in memory, and all the rest of Stata has been optimized for that. SAS reads data from disk, and all the rest of SAS has been optimized for that. I do not know which package will be faster in this case. All I know for sure is is that, as dataset size grows, the SAS code will slow down less than will the Stata code I just suggested. Solution 2 ---------- Let's get rid of the `if' on the end. The solution below might be more efficient, but I don't guarantee it. I'm about to substitute disk I/O for re-evaluation of the if, and thus make Stata more closely mimic how SAS operates. I don't guarantee this solution is faster because, as previously stated, Stata is very fast at re-evaluating if statements, and because I will end up substituting more I/O than SAS performs in this case executing it's code. I have other reasons for suggesting this solution, which reasons will become obvious in the telling. The solution is, local `r' = "replace" forvalues yr=1980(1)2008 { save hold keep if FLPDYR == `yr' if `yr'==1980 { ... } else if `yr'==1981 { ... } ... else if `yr'==2003 { `r' _amt5pc = min(c24533,min(c24532,min(c62700,c24517))) `r' _amt5pc = max(0,_amt5pc) `r' c62747 = .05*_amt5pc ... } else ... save result, emptyok use hold drop if FLPDYR==`yr' append using hold } Here's an even more readable version of this solution: local `r' = "replace" forvalues yr=1980(1)2008 { save hold keep if FLPDYR == `yr' taxyear`yr' save result, emptyok use hold drop if FLPDYR==`yr' append using hold } Note the line -taxyear`yr'-. If `yr'==2003, then that will execute the subroutine -taxyear2003-. Cute, huh? Then I write subroutines for each of the tax years, such as program taxyear2003 local r = `replace' `r' _amt5pc = min(c24533,min(c24532,min(c62700,c24517))) `r' _amt5pc = max(0,_amt5pc) `r' c62747 = .05*_amt5pc ... end What I like about this solution is that the resulting code is very readable -- perhaps even more readable than the original SAS code -- and it does not require changing the original SAS code much. Other solutions --------------- Daniel could use Mata. That would address both the readability and efficiency issues. If I were writing this code for the first time, that is what I would do, probably. With Mata, I can go through the observations one at a time just as SAS does. But if I had code already written in SAS, I would use solution 2, version 2. The changes required by that solution are minimal and I will spend less time debugging and convincing myself that I had the same answers as previously, than if I started all over again. -- Bill wgould@stat.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: RE: st: Converting a SAS datastep to Stata***From:*Nick Cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**RE: st: margins vs. lincom** - Next by Date:
**st: data format MCA** - Previous by thread:
**RE: st: Converting a SAS datastep to Stata** - Next by thread:
**RE: RE: st: Converting a SAS datastep to Stata** - Index(es):