Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: RE: inlist() equivalent for mata?


From   Joe Canner <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: RE: RE: RE: inlist() equivalent for mata?
Date   Wed, 19 Mar 2014 17:44:01 +0000

Andrew,

Got it...you are going considerably beyond the capabilities of -inlist()-. (That wasn't clear in your original post.)

For a project of that size a hash will definitely be faster than a linear search, even taking into account the amount of time it takes to set up the hash.  Given that, I'm not sure what is wrong with an associative array, which uses hash tables and probably does a better job of it than any of us could.  You may want to tune your implementation by reviewing the section "Setting the efficiency parameters" in the help file for -asarray-.

Regards.
Joe

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Andrew Maurer
Sent: Wednesday, March 19, 2014 1:34 PM
To: [email protected]
Subject: st: RE: RE: inlist() equivalent for mata?

Thanks for the reply, Joe.

Just to clarify, I am trying to apply this to a very large dataset. Ie, B is on the order of 1 billion elements and L on the order of 100,000 elements. "a quick spin through the list using a loop" would equate to 1 billion loops over 100,000 elements (check if b1 is in L, check if b2 is in L,..., check if b1000000000 is in L). If I can find a clever method that only results in 1 billion O(1) checks, I'd be saving a SIGNIFICANT amount of computation time.

I think that the problem reduces to one of indexing. Eg) if A = ( (1, 3, 5, 9)' , (1, 1, 0, 1)' ), I am looking for a way to index A such that A[5] = 0, A[9] = 1, and A[2] doesn't exist. Ie - it's a very simple hash map. I just don't know how to program this in mata in an efficient way.

Andrew Maurer


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Joe Canner
Sent: Wednesday, March 19, 2014 12:01 PM
To: [email protected]
Subject: st: RE: inlist() equivalent for mata?

Andrew,

You may be over-thinking this one, at least for the simple case that corresponds to -inlist()- (find the first argument among the remaining arguments).  Stata has a limit of 256 numeric arguments for -inlist()- and I suspect that a quick spin through the list using a loop will be much more efficient than any clever solutions one might come up with.

All that said, if you want to expand beyond 256 arguments or if you want to incorporate additional functionality (as in your 3rd test example where each element of one vector is compared to each element of another), then perhaps a more sophisticated solution is warranted.

I suspect there is a clever way to do all pairwise comparisons between two vectors, but I would need to dust off my matrix algebra skills to think of what it might be.  Such a method, if it exists, could also be extended to the simple case and slightly improve the brute force method.

Regards,
Joe Canner
Johns Hopkins University School of Medicine


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Andrew Maurer
Sent: Wednesday, March 19, 2014 12:25 PM
To: Statalist Statalist
Subject: st: inlist() equivalent for mata?

Hi Statalist,

I am trying to find a mata equivalent of Stata's inlist() function, but have not found anything similar when looking through the directory of built-in mata functions. I have a few roundabout ideas, but they seem very complicated in order to a task that should be quite simple.

Goal: for a given vector, B, with elements b1,b2,...,bN, and list, L, of size T, return a vector of size N, where the ith element = 1 if bi is in L, otherwise 0. In my case T >> N and all elements are integers.

Example, let the list, L = {1, 3, 99}

Idea 1: Does mata have a conception of sparse matrices? The idea would be to create a vector, M, where only row 1, row 3, and row 99 exist. and the value for each element is 1. The inlist(b,L) function would return M[b] if M[b] exists, else return 0.

Idea 2: Create a separate pointer for every element in L, named "p" followed by the value of the element. Each pointer would point to the value 1. In this case, p1, p3, and p99 would be created, where *p1 = *p3 = *p99 = 1. The inlist(b,L) function would return *p`b' if *p`b' exists, else 0 (I'm not sure how to get around the stata local syntax here).

Idea 3: Use a hash table / associative array. It would make sense to create a perfect hash table, where the table is created such that every element of L maps to a unique value, but the only function I could find is hash1(), which can result in collisions. I wrote a working example using asarray()(see code below), but it seems somewhat inefficient.

Thank you,

Andrew Maurer


************ define inlist() *********************
cap mata mata drop inlist()
mata

real colvector inlist(real colvector B, real colvector L)
{

// initialize matrices
real matrix M
real colvector R

// create the map, M
M = asarray_create("real", 1, rows(L))
asarray_notfound(M, 0)

for (i=1; i<=rows(L); i++) {
	asarray(M, L[i], 1)
}

// create vector to be returned
R = J(rows(B),1,.)
for (i=1; i<=rows(B); i++) {
	R[i,1] = asarray(M, B[i])
}

return(R)

}

end
************ end inlist() definition *************


************ test inlist() *****************
mata
mylist = 1 \ 4 \ 7 \ 10
b1 = 3
b2 = 4
b3 = 1 \ 2 \ 3 \ 4

inlist(b1,mylist) // returns 0
inlist(b2,mylist) // returns 1
inlist(b3,mylist) // returns (1,0,0,1)'
end
************ end test inlist() *************


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index