Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Converting a SAS datastep to Stata


From   Scott Merryman <[email protected]>
To   [email protected]
Subject   Re: st: Converting a SAS datastep to Stata
Date   Mon, 13 Dec 2010 20:01:57 -0600

Take a look at -foreach- and -forvalues- (and -levelsof- it is also
very useful in these types of problems).

One way would be to do something like:

local yr = 1993
gen exemption = .
qui {
foreach exemp in 2350 2450 2500 2550 2650 2700 ///
   2750 2800 2900 3000 3050 3100 3200 3300 3400 3500 {
	replace exemption = `exemp' if flpdyr == `yr'
	local yr = `yr' + 1
}
}


Scott

On Mon, Dec 13, 2010 at 6:51 PM, Daniel Feenberg <[email protected]> wrote:
> I have done programs to calculate income tax liability in SAS and fortran.
> Both those languages allow tax parameters that vary across years and filing
> status to be held in initialized arrays. For example, in SAS one could
> declare:
>
>   array exmp(1993:2010) _temporary_;
>   retain exmp 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050 3100
>               3200 3300 3400 3500;
>
> and then assigning the correct value of the personal exemption to every
> individual record is just:
>
>   exemption = exmp(fldpyr);
>
> where flpdyr is a variable in the data with the filing year. I am at a bit
> of a loss as to how to do this in Stata. I don't like:
>
>   gen exemption = (flpdyr==1993)*2350 + (flpdyr==1994)*2450...(for 18
> subexpressions in all)
>
> or
>
>   gen     exemption = 2350, if flpdyr==1993
>   replace exemption = 2450, if flpdyr==1994
>   ...(for 18 lines in all)...
>
> because these require (and execute) so much repetitive code.
>
> Another possibility is to create a dataset of parameters by year and filing
> status, then sort the tax return data by year and filing status, and finally
> merge the parameters onto the tax return data. But that requires a sort and
> a lot of I/O, which could be slow with potentially millions of returns. The
> additional memory required is probably not a big issue.
>
> I don't actually know Mata, but I think I could define a rowvector:
>
>    exmp =  ( 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050 3100
>               3200 3300 3400 3500);
>
> and then loop over all the tax returns executing:
>
>    exemption[i] = exmp[flpdyr[i]-1992];
>
> for each return (where i indexes returns). That seems to mean that every
> variable is going to have to carry around a [i] subscript and there will be
> a 1,000 lines of Mata code executed for each return (rather than the
> preferred 1,000 lines of code for all the returns together). That is much
> less attractive than leaving the observation number implicit, as the regular
> Stata language does. Brief study of [M-2]subscripts doesn't suggest any
> "matrixy" way of coding this.
>
> I expect I am missing something obvious, can someone point me in the right
> direction?
>
> Thanks
>
> Daniel Feenberg
> NBER
> Cambridge MA
> [email protected]
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index