Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Daniel Feenberg <feenberg@nber.org> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Re: Converting a SAS datastep to Stata |

Date |
Tue, 14 Dec 2010 07:14:30 -0500 (EST) |

On Mon, 13 Dec 2010, Kevin Geraghty wrote:

FYI, I tried this to satisfy my own curiosity; it works. Probably the most parsimonious approach. assuming your dataset has a variable "year" defined, taking values from 1993 through 2008, and the values specified for "exmp" are in the correct ascending year order. matrix input exmp=(2350, 2450, 2500, 2550, 2650, 2700, 2750, 2800, 2900, 3000, 3050, 3100, 3200, 3300, 3400, 3500) gen int exemption = exmp[1,year-1992]

BTW, anyone looking for the existing stata callable fortran version should net from "http://www.nber.org"; net describe taxsim9

Thanks again, Daniel Feenberg

----- "Joseph Coveney" <jcoveney@bigplanet.com> wrote:Daniel Feenberg wrote: I have done programs to calculate income tax liability in SAS and fortran. Both those languages allow tax parameters that vary across years and filing status to be held in initialized arrays. For example, in SAS one could declare: array exmp(1993:2010) _temporary_; retain exmp 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050 3100 3200 3300 3400 3500; and then assigning the correct value of the personal exemption to every individual record is just: exemption = exmp(fldpyr); where flpdyr is a variable in the data with the filing year. I am at a bit of a loss as to how to do this in Stata. I don't like: gen exemption = (flpdyr==1993)*2350 + (flpdyr==1994)*2450...(for 18 subexpressions in all) or gen exemption = 2350, if flpdyr==1993 replace exemption = 2450, if flpdyr==1994 ...(for 18 lines in all)... because these require (and execute) so much repetitive code. Another possibility is to create a dataset of parameters by year and filing status, then sort the tax return data by year and filing status, and finally merge the parameters onto the tax return data. But that requires a sort and a lot of I/O, which could be slow with potentially millions of returns. The additional memory required is probably not a big issue. I don't actually know Mata, but I think I could define a rowvector: exmp = ( 2350 2450 2500 2550 2650 2700 2750 2800 2900 3000 3050 3100 3200 3300 3400 3500); and then loop over all the tax returns executing: exemption[i] = exmp[flpdyr[i]-1992]; for each return (where i indexes returns). That seems to mean that every variable is going to have to carry around a [i] subscript and there will be a 1,000 lines of Mata code executed for each return (rather than the preferred 1,000 lines of code for all the returns together). That is much less attractive than leaving the observation number implicit, as the regular Stata language does. Brief study of [M-2]subscripts doesn't suggest any "matrixy" way of coding this. I expect I am missing something obvious, can someone point me in the right direction? -------------------------------------------------------------------------------- The number of years is limited and they're integers, so you could probably get away with value labels and a one-shot work-up (see below). This SAS-ish approach might be faster than any -merge- (with its implicit -sort-) when you have millions of observations in the tax-record dataset. I'd bet that becoming familiar with Mata's -asarray()- (think: Paul Dorfman) will be more gratifying in the long run. Joseph Coveney P.S. What does SAS do when you have more index values (18 years) than array values (16 exemptions)? Does it pad the last value out to the end of the array, or recycle à la R? version 11.1 clear * set more off set obs 18 generate int year = 1992 + _n * * Begin here * local value_label label define Exemptions local year 1993 foreach exemption in 2350 2450 2500 2550 2650 /// 2700 2750 2800 2900 3000 3050 3100 3200 /// 3300 3400 3500 3550 3600 { local value_label `value_label' `year' "`exemption'" local ++year } `value_label' label values year Exemptions decode year, generate(exemption) _strip_labels year destring exemption, replace list, noobs abbreviate(20) separator(0) exit * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Re: Converting a SAS datastep to Stata***From:*Kevin Geraghty <kevin@ecotope.com>

- Prev by Date:
**RE: st: RE: reliability with ordinal data-Kendall's w?** - Next by Date:
**Re: st: Difference between xtlogit, xtmelogit, gllamm** - Previous by thread:
**Re: st: Re: Converting a SAS datastep to Stata** - Next by thread:
**st: using hierarchical data of household and persons, need to copy some variables from parents observations and append to children** - Index(es):