Thank you very much for your feedback! Natalie Date: Tue, 24 Jun 2003 10:01:33 -0500 From: jpitblado@stata.com (Jeff Pitblado, Stata Corp.) Subject: Re: st: multiply two data sets Natalie Karavarsamis <Natalie.K@cancervic.org.au> asks about multiplying two datasets, treating them as if they were matrices, where one of the datasets is too big to convert to a matrix: > I have two data sets; a data file, call this A, which is 41000 rows x 130 > columns, and another file,call this B, 130 rows by 50 columns. > > I want to multiply A and B (C=AxB). It would be ideal to treat A and B as > matrices and use matrix multiplication but the maximum matrix size is 11000 > x 11000 (we run Stata 7.0 SE). Is there a way around this? If not, are there > any suggestions of how else to do this? I don't want to cut matrix A (or B) > into smaller data sets (matrices). If you have enough memory to hold both A and C, which I estimate to be just under 60m, I would suggest using -matrix score-. -matrix score- will generate a new variable from the linear combination of elements in a row vector and the variables in memory. See [P] matrix score. To illustrate, the following do-file generates two datasets, a.dta and b.dta, according to the sizes Natalie indicates: ***** BEGIN: genab.do * generate some data clear set mem 50m set obs 41000 forval i = 1/130 { di as txt "generating a`i'" gen double a`i' = uniform() } save a, replace clear set obs 130 forval i = 1/50 { di as txt "generating b`i'" gen double b`i' = uniform() } save b, replace exit ***** END: genab.do In genc.do, prepare for the product by setting the memory to be large enough. Then put the data from b.dta into a matrix -b- using -mkmat- (notice the trick I use to get a list of all the variable names into the -`varlist'- macro). Use the data in a.dta and loop over the columns of matrix -b-, generating each new column of the new dataset/matrix C using -matrix score-. Note that when you grab each column of the matrix -b-, turn it into a row vector and put the variable names from dataset a.dta as its column names. Then -matrix score- does all the work of multiplying. ***** BEGIN: genc.do * take matrix product of datasets a.dta and b.dta * make the matrix from b.dta (the smaller dataset) clear set mem 60m use b local 0 syntax [varlist] mkmat `varlist', matrix(b) * use -matrix score- to compute the linear combinations of the variables in * a.dta, where the coefficients are from the columns of b.dta use a, clear local 0 syntax [varlist] local k = colsof(b) forval i = 1/`k' { matrix bi = b[1...,`i']' matrix colnames bi = `varlist' matrix score double c`i' = bi di as txt "generating c`i'" } keep c* save c, replace exit ***** END: genc.do I tested the above do-files using Stata/SE 7.0 and Stata/SE 8.0. - --Jeff jpitblado@stata.com -------------------------------------------- Natalie Karavarsamis Statistician Cancer Epidemiology Centre The Cancer Council Victoria 100 Drummond Street Carlton VIC 3053 ph: (03) 9635 5159 fax: (03) 9635 5330 www.cancervic.org.au * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

