Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: Converting a SAS datastep to Stata


From   Kevin Geraghty <kevin@ecotope.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Re: Converting a SAS datastep to Stata
Date   Mon, 13 Dec 2010 20:33:40 -0800 (PST)

FYI, I tried this to satisfy my own curiosity; it works. Probably the most parsimonious approach.
assuming your dataset has a variable "year" defined, taking values from 1993 through 2008, and the values specified for "exmp" are in the correct ascending year order. 

matrix input exmp=(2350, 2450, 2500, 2550, 2650, 2700, 2750, 2800, 2900, 3000, 3050, 3100, 3200, 3300, 3400, 3500)
gen int exemption = exmp[1,year-1992]









----- "Joseph Coveney" <jcoveney@bigplanet.com> wrote:

> Daniel Feenberg wrote:
> 
> I have done programs to calculate income tax liability in SAS and
> fortran. 
> Both those languages allow tax parameters that vary across years and 
> filing status to be held in initialized arrays. For example, in SAS
> one 
> could declare:
> 
>     array exmp(1993:2010) _temporary_;
>     retain exmp 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050
> 3100
>                 3200 3300 3400 3500;
> 
> and then assigning the correct value of the personal exemption to
> every 
> individual record is just:
> 
>     exemption = exmp(fldpyr);
> 
> where flpdyr is a variable in the data with the filing year. I am at a
> bit 
> of a loss as to how to do this in Stata. I don't like:
> 
>     gen exemption = (flpdyr==1993)*2350 + (flpdyr==1994)*2450...(for
> 18
> subexpressions in all)
> 
> or
> 
>     gen     exemption = 2350, if flpdyr==1993
>     replace exemption = 2450, if flpdyr==1994
>     ...(for 18 lines in all)...
> 
> because these require (and execute) so much repetitive code.
> 
> Another possibility is to create a dataset of parameters by year and 
> filing status, then sort the tax return data by year and filing
> status, 
> and finally merge the parameters onto the tax return data. But that 
> requires a sort and a lot of I/O, which could be slow with potentially
> 
> millions of returns. The additional memory required is probably not a
> big 
> issue.
> 
> I don't actually know Mata, but I think I could define a rowvector:
> 
>      exmp =  ( 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050
> 3100
>                 3200 3300 3400 3500);
> 
> and then loop over all the tax returns executing:
> 
>      exemption[i] = exmp[flpdyr[i]-1992];
> 
> for each return (where i indexes returns). That seems to mean that
> every 
> variable is going to have to carry around a [i] subscript and there
> will 
> be a 1,000 lines of Mata code executed for each return (rather than
> the 
> preferred 1,000 lines of code for all the returns together). That is
> much 
> less attractive than leaving the observation number implicit, as the 
> regular Stata language does. Brief study of [M-2]subscripts doesn't 
> suggest any "matrixy" way of coding this.
> 
> I expect I am missing something obvious, can someone point me in the
> right 
> direction?
> 
> --------------------------------------------------------------------------------
> 
> The number of years is limited and they're integers, so you could
> probably get 
> away with value labels and a one-shot work-up (see below).  This
> SAS-ish 
> approach might be faster than any -merge- (with its implicit -sort-)
> when you 
> have millions of observations in the tax-record dataset.
> 
> I'd bet that becoming familiar with Mata's -asarray()- (think: Paul
> Dorfman) 
> will be more gratifying in the long run.
> 
> Joseph Coveney
> 
> P.S.  What does SAS do when you have more index values (18 years) than
> array
> values (16 exemptions)?  Does it pad the last value out to the end of
> the array,
> or recycle à la R?
> 
> version 11.1
> 
> clear *
> set more off
> set obs 18
> generate int year = 1992 + _n
> 
> *
> * Begin here
> *
> local value_label label define Exemptions
> local year 1993
> foreach exemption in 2350 2450 2500 2550 2650 ///
>     2700 2750 2800 2900 3000 3050 3100 3200 ///
>     3300 3400 3500 3550 3600 {
>     local value_label `value_label' `year' "`exemption'"
> 	local ++year
> }
> `value_label'
> label values year Exemptions
> decode year, generate(exemption)
> _strip_labels year
> destring exemption, replace
> list, noobs abbreviate(20) separator(0)
> exit
> 
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index