Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Converting a SAS datastep to Stata

From	Daniel Feenberg <[email protected]>
To	[email protected]
Subject	st: Converting a SAS datastep to Stata
Date	Mon, 13 Dec 2010 19:51:10 -0500 (EST)

I have done programs to calculate income tax liability in SAS and fortran.Both those languages allow tax parameters that vary across years andfiling status to be held in initialized arrays. For example, in SAS onecould declare:


   array exmp(1993:2010) _temporary_;
   retain exmp 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050 3100
               3200 3300 3400 3500;

and then assigning the correct value of the personal exemption to everyindividual record is just:


   exemption = exmp(fldpyr);

where flpdyr is a variable in the data with the filing year. I am at a bitof a loss as to how to do this in Stata. I don't like:


   gen exemption = (flpdyr==1993)*2350 + (flpdyr==1994)*2450...(for 18 subexpressions in all)

or

   gen     exemption = 2350, if flpdyr==1993
   replace exemption = 2450, if flpdyr==1994
   ...(for 18 lines in all)...

because these require (and execute) so much repetitive code.

Another possibility is to create a dataset of parameters by year andfiling status, then sort the tax return data by year and filing status,and finally merge the parameters onto the tax return data. But thatrequires a sort and a lot of I/O, which could be slow with potentiallymillions of returns. The additional memory required is probably not a bigissue.


I don't actually know Mata, but I think I could define a rowvector:

    exmp =  ( 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050 3100
               3200 3300 3400 3500);

and then loop over all the tax returns executing:

    exemption[i] = exmp[flpdyr[i]-1992];

for each return (where i indexes returns). That seems to mean that everyvariable is going to have to carry around a [i] subscript and there willbe a 1,000 lines of Mata code executed for each return (rather than thepreferred 1,000 lines of code for all the returns together). That is muchless attractive than leaving the observation number implicit, as theregular Stata language does. Brief study of [M-2]subscripts doesn'tsuggest any "matrixy" way of coding this.

I expect I am missing something obvious, can someone point me in the rightdirection?


Thanks

Daniel Feenberg
NBER
Cambridge MA
[email protected]


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Converting a SAS datastep to Stata
  - From: Austin Nichols <[email protected]>
- st: Re: Converting a SAS datastep to Stata
  - From: "Joseph Coveney" <[email protected]>
- Re: st: Converting a SAS datastep to Stata
  - From: "Michael N. Mitchell" <[email protected]>
- Re: st: Converting a SAS datastep to Stata
  - From: Scott Merryman <[email protected]>

Prev by Date: st: Re: reliability with ordinal data-Kendall's w?
Next by Date: Re: st: How to define an external Mata class within the namespace of an ado-file
Previous by thread: st: reliability with ordinal data-Kendall's w?
Next by thread: Re: st: Converting a SAS datastep to Stata
Index(es):
- Date
- Thread