Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Use of matrix values in generate statements


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Use of matrix values in generate statements
Date   Sun, 27 Mar 2011 00:15:46 +0000

Kit has given the most important answer: Mata is a much richer
language for handling non-standard problems.

I want to add a footnote. Here is a basic technique for using a lookup
matrix to populate a variable z. I mix algebra with Stata. The idea is
that variables x and y tell you which row and column of the matrix to
use.

matrix lookup = ...
gen z = .
forval i = 1/I {
        forval j = 1/J {
             quietly replace z = lookup[`i', `j'] if x == `i' & y == `j'
        }
}

I am just using -forval- to turn Daniel's statements into a double
loop over possibilities.

Nick

On Sat, Mar 26, 2011 at 11:55 PM, Christopher Baum <[email protected]> wrote:
> <>
> Dan says
>
> I continue to work on a tax calculator for Stata.
>
> I am at the point of calculating the standard deduction for each taxpayer.
> There are 6 possible filing status's and 24 years of tax law, so there are
> 144 possible values for the deduction. In SAS, fortran, PL/1, C or any
> other language I know of, the calculation would be some form of:
>
>    stded = stdvalues(year,filestat)
>
> and the processor would index into the 24x6 array of stdvalues to obtain
> the value for each taxpayer. As I understand it, Stata matricies can't be
> used in -generate- statements, though, so I can't do something like:
>
>    matrix input stdvalues (3700 6200...\3800 6350...\...
>    generate stded = stdvalues[year-1992,filestat]
>
> (Here and below, ... is meant to conceal a lot of typing on my part but
> 3700 is the deduction in 1993 for a single taxpayer, 6350 is the deduction
> in 1994 for a joint return, etc). The most straightforward way I can see
> to calculate the deduction in Stata would be:
>
>   generate   stded = 3700 if year == 1993 & filestat == 1
>   replace    stded = 6200 if year == 1993 & filestat == 2
>   ...
>
> and so forth, for 144 lines. I have millions of observations, and will
> make thousands of runs, so I am looking for a more efficient solution. My
> next thought is:
>
>   generate stded = (year==1993&filestat==1)*3700+(year==1993&filestat==2)*6200...
>
> which would be one very long line of code once all 144 terms were written
> out, and still quite a bit of wasted arithmetic.  Still a third
> possibility would be -recode-:
>
>   gen filestatyear = year*10+filestat
>   recode filestatyear (19931 = 3700)(19932 = 6200)...
>
> but looking at the -recode- .ado file suggests that this is not an
> efficiency gain.
>
> I take it I am supposed to -sort- the data by year and filestat, and then
> -merge- onto a file of parameter values by year and filestat:
>
>   sort year filestat
>   merge m:1 year filestat using params
>
> where params is a dataset with the deduction amount for each year and
> filestat. This is a reasonable amount of code, (even including the code
> necessary to create params) but it is not space efficient and it strikes
> me as odd that a large dataset needs to be sorted, just to make some
> simple recodes. Is that right? Am I missing something?
>
> I note that the -egen- command -mtr- must address this same question, but
> it is not very fast - about 1,000 observations/minute on our hardware.
>
> Oddly enough, although one cannot index into a Stata matrix, it is
> possible to index into a series observation:
>
>     generate stded = stdvalues[filestatyear-199200]
>
> is very fast, but doesn't address the problem of filling stdvalues in a
> not too hackish manner (especially if there are fewer than 144 taxpayers
> in the dataset).
>
>
>
> The following code will do 1 million table lookups in 8 or 9 seconds on my laptop:
>
> ---------------------------------
> clear all
> // fake data for lookup table
> mata: sdlookup = 100*runiform(24,6) :+ 3200
>
> set obs 10
> input year fs
> 1994 1
> 1998 2
> 1999 1
> 2000 6
> 2000 5
> 2005 3
> 2004 4
> 1996 2
> 2008 5
> 2007 3
> expand 100000
> g byte yrind = year - 1992
> g stded = .
> set rmsg on
> mata
> st_view(yrfs=., ., ("yrind","fs"))
> st_view(stded=., . , "stded")
> for(i=1; i<=rows(stded); i++) {
>        stded[i] = sdlookup[yrfs[i,1], yrfs[i,2]]
> }
> end
> su stded
> ---------------------------------
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index