Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <n.j.cox@durham.ac.uk> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: RE: st: Converting a SAS datastep to Stata |

Date |
Thu, 16 Dec 2010 17:12:25 +0000 |

Bill wrote here, among much good stuff, program taxyear2003 local r = `replace' `r' _amt5pc = min(c24533,min(c24532,min(c62700,c24517))) `r' _amt5pc = max(0,_amt5pc) `r' c62747 = .05*_amt5pc ... end He didn't mean that. The first line is better as local r "replace" What he wrote instead is legal, but at that point in the program no local macro -replace- is defined, so local r will be born empty, and the rest of the program will fail. local r = "replace" would work but using an = sign here is a habit to avoid whenever no evaluation of the expression defining the macro is needed. He also wrote twice local `r' = "replace" and once local R = "replace" and those lines are typos for what is above. They should all start local r Nick n.j.cox@durham.ac.uk William Gould, StataCorp LP Concerning SAS code that he is translating to Stata, Daniel Feenberg <feenberg@nber.org> wrote, > Repeating the if qualifier means repeating a calculation, which is an > inefficiency, but it also means repeating the code, which is ugly and > distracting. That is why I asked about the possibility of a block level if > qualifier. If it doesn't exist, I'll put it in W Gould's suggestion box. Daniel made the above comments concerning code he is translating from SAS to Stata. The SAS code reads, if FLPDYR eq 2003 then do; _amt5pc = min(c24533,min(c24532,min(c62700,c24517))); _amt5pc = max(0,_amt5pc); c62747 = .05*_amt5pc; _line49 = max(0,min(c24532,min(c24517,c62700)) - _amt5pc); _line50 = sum(e24583,0); _amt8pc = min(_line49,_line50); c62749 = .08*_amt8pc; _amt10pc = _line49 - _amt8pc; c62750 = .1*_amt10pc; _line55 = c24533 - _amt5pc; _line56 = min(c24517,c62700) - min(c24532,min(c24517,c62700)); _amt15pc = min(_line55,_line56); c62755 = .15*_amt15pc; _amt20pc = _line56 - _amt15pc; c62760 = .2*_amt20pc; _amt25pc = min(c62700,min(c24517+e24515,c24516))-min(c62700,c24517); c62770 = .25*_amt25pc; _tamt2 = c62747 + c62749 + c62750 + c62755 + c62760 + c62770; end; The above is the code code for one of the years, and Daniel has a lot more code for eacxh of the other years. The problem is that Stata puts if qualifiers on end of lines whereas SAS puts them out front. In this case, the resulting SAS code is easier to read, and to write. Solution 1 ---------- My first solution addresses the readability issue and allows Daniel to translate the code with easy-to-apply global edits: local R = "replace" local if = "if FLPDYR==2003" `r' _amt5pc = min(c24533,min(c24532,min(c62700,c24517))) `if' `r' _amt5pc = max(0,_amt5pc) `if' `r' c62747 = .05*_amt5pc `if' `r' _line49 = max(0,min(c24532,min(c24517,c62700)) - _amt5pc) `if' ... `r' _tamt2 = c62747 + c62749 + c62750 + c62755 + c62760 + c62770 `if' local if = "if FLPYDR==2004" `r' ... `if' ... What I did was add `r' to the front of each of the original SAS lines, and repalce the the semicolon at the end with `if'. This solution does not address Daniel's comment about code efficiency (the reinterpretation of the `if' line by line by line), but does address the problem with "ugly and distracting". By the way, concerning efficiency, while I agree that reevaluating the if line by line by line is inefficient, that does not imply that the above Stata code runs more slowly than the original SAS code. Stata keeps the data in memory, and all the rest of Stata has been optimized for that. SAS reads data from disk, and all the rest of SAS has been optimized for that. I do not know which package will be faster in this case. All I know for sure is is that, as dataset size grows, the SAS code will slow down less than will the Stata code I just suggested. Solution 2 ---------- Let's get rid of the `if' on the end. The solution below might be more efficient, but I don't guarantee it. I'm about to substitute disk I/O for re-evaluation of the if, and thus make Stata more closely mimic how SAS operates. I don't guarantee this solution is faster because, as previously stated, Stata is very fast at re-evaluating if statements, and because I will end up substituting more I/O than SAS performs in this case executing it's code. I have other reasons for suggesting this solution, which reasons will become obvious in the telling. The solution is, local `r' = "replace" forvalues yr=1980(1)2008 { save hold keep if FLPDYR == `yr' if `yr'==1980 { ... } else if `yr'==1981 { ... } ... else if `yr'==2003 { `r' _amt5pc = min(c24533,min(c24532,min(c62700,c24517))) `r' _amt5pc = max(0,_amt5pc) `r' c62747 = .05*_amt5pc ... } else ... save result, emptyok use hold drop if FLPDYR==`yr' append using hold } Here's an even more readable version of this solution: local `r' = "replace" forvalues yr=1980(1)2008 { save hold keep if FLPDYR == `yr' taxyear`yr' save result, emptyok use hold drop if FLPDYR==`yr' append using hold } Note the line -taxyear`yr'-. If `yr'==2003, then that will execute the subroutine -taxyear2003-. Cute, huh? Then I write subroutines for each of the tax years, such as program taxyear2003 local r = `replace' `r' _amt5pc = min(c24533,min(c24532,min(c62700,c24517))) `r' _amt5pc = max(0,_amt5pc) `r' c62747 = .05*_amt5pc ... end What I like about this solution is that the resulting code is very readable -- perhaps even more readable than the original SAS code -- and it does not require changing the original SAS code much. Other solutions --------------- Daniel could use Mata. That would address both the readability and efficiency issues. If I were writing this code for the first time, that is what I would do, probably. With Mata, I can go through the observations one at a time just as SAS does. But if I had code already written in SAS, I would use solution 2, version 2. The changes required by that solution are minimal and I will spend less time debugging and convincing myself that I had the same answers as previously, than if I started all over again. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: RE: st: Converting a SAS datastep to Stata***From:*wgould@stata.com (William Gould, StataCorp LP)

- Prev by Date:
**Re: st: Cluster Bootstrapping : repeated time values within panel error** - Next by Date:
**Re: st: xtnbreg, nbreg, and tests of assumptions** - Previous by thread:
**Re: RE: st: Converting a SAS datastep to Stata** - Next by thread:
**Re: RE: st: Converting a SAS datastep to Stata** - Index(es):