Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Converting a SAS datastep to Stata

From   "Joseph Coveney" <>
To   <>
Subject   Re: st: Converting a SAS datastep to Stata
Date   Thu, 16 Dec 2010 08:38:39 +0900

Daniel Feenberg wrote:


Here is the SAS code for capital gains under the alternative minimum tax 
for a single year:

   if FLPDYR eq 2003 then do;
      _amt5pc = min(c24533,min(c24532,min(c62700,c24517)));
      _amt5pc = max(0,_amt5pc);
      c62747 = .05*_amt5pc;
      _line49 = max(0,min(c24532,min(c24517,c62700)) - _amt5pc);
      _line50 = sum(e24583,0);
      _amt8pc = min(_line49,_line50);
      c62749 = .08*_amt8pc;
      _amt10pc = _line49 - _amt8pc;
      c62750 =  .1*_amt10pc;
      _line55 = c24533 - _amt5pc;
      _line56 = min(c24517,c62700) - min(c24532,min(c24517,c62700));
      _amt15pc = min(_line55,_line56);
      c62755 =  .15*_amt15pc;
      _amt20pc = _line56 - _amt15pc;
      c62760 =  .2*_amt20pc;
      _amt25pc = min(c62700,min(c24517+e24515,c24516))-min(c62700,c24517);
      c62770 =  .25*_amt25pc;
      _tamt2  = c62747 + c62749 + c62750 + c62755 + c62760 + c62770;


Repeating the if qualifier means repeating a calculation, which is an 
inefficiency, but it also means repeating the code, which is ugly and 
distracting. That is why I asked about the possibility of a block level if 
qualifier. If it doesn't exist, I'll put it in W Gould's suggestion box.

One thing I could do is allow more complex assignment statements, with 
fewer of the intermediate values that are used to clarify purpose and show 
the correspondence to the tax form. That could reduce the number of 
statements by half but is otherwise undesirable.


I'll second Austin's suggestion to move this to Mata.  This will be trivial in
Mata, using its ability to create updatable views and subviews onto the dataset.

With SAS, the DATA step doesn't have all of the data in memory; it scrolls 
through the input file one logical record at a time, places its contents into 
the program data vector, creates the variables or replaces values in them (I 
can't tell which you're doing from the excerpt), saves the record to an
output file, and proceeds to the next logical record in the input file.  (Thus
the perennial concern among SAS users about I/O.)  The IF block checks the 
conditions upon reading in the logical record.  If the IF condition isn't met, 
the DATA step goes to the next logical record in the input file without 
creating/changing the data.

In Stata, in contrast, you have all of the data in memory, and 
creating/replacing data is "vectorized", and so you'll not get an IF-block 
style concept _in Stata_.*  This is just a consequence of the different data 
models between SAS and Stata.

But, Mata has the ability to select blocks of observations (and variables) in
the Stata dataset and work on the block in isolation in situ.  Mata's views 
and subviews give you the very "block-level if qualifier" that you're seeking.

Joseph Coveney

* Absent -use if FLPDYR == 2003 using ALLRETURNS-, -generate/replace . . .- 
and -save FLPDYR2003- in forced direct analogy with the SAS idiom.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index