Austin Nichols <austinnichols@gmail.com>: As for the sparse matrix problem in (A), you can generate a new variable with all distinct concatenations of rowvar and colvar, then cycle over the values of that, thereby ignoring the empty cells. On Tue, May 13, 2008 at 10:18 AM, Sergiy Radyakin <serjradyakin@gmail.com> wrote:

Thank you all, who responded to my request regarding obtaining a matrix of means. Besides the answers posted in this thread I have received a couple of suggestions privately. To summarize and close the thread, the suggestions can be divided roughly into two groups: A. Obtaining all possible levels of the by-variables, then cycling through these values and computing means for each subgroup. This can be quite slow, especially in case of "sparse" matrices, where only a few non-empty cells exist (for a 50x50 matrix -summarize- must be called 2500 times). B. Using other Stata commands which can produce matrix of means as a by-product. Unfortunately none of them is fast enough either. In particular, Joseph Coveney suggested using xi to automatically create all combinations of values and then estimating a univariate regression. Although this is a very short code, it is perhaps the slowest, and demands large amounts of memory.

-------------------------------------------------------------------------------- Sergiy, it'll help us to help you better if you're more specific about the scope of your problem up front; Austin's original reply's -tabmat- seemed ideal to me given what you gave the list to go on; and my suggestion works well for the example that you gave in your post, which I took to be illustrative of scope of the individual summarization that you want to repeat many times and therefore want to avoid -preserve-s, etc. Austin's point above about concatenating applies to sparse matrix problems in (B), too: see below for timing of a (B)-approach compared to -table , contents(mean )-, which is the benchmark you give in your original post. Note that -anova , noconstant category()- is used in lieu of -xi: regress , noconstant-, because it's more efficient here. Joseph Coveney clear * set matsize 800 // Nothing extraordinary set memory 10M // Nothing extraordinary set obs 250000 // I don't know how many you have--is this in the ballpark? /* A 50 X 50 matrix */ generate byte a = mod(_n, 50) sort a generate byte b = mod(_n, 50) generate float c = uniform() /* Make that sparse */ foreach var of varlist a b { replace `var' = 0 if !inrange(`var', 20, 30) } * timer clear 1 quietly forvalues i = 1/10 { timer on 1 table a b, contents(mean c) timer off 1 } timer clear 2 quietly forvalues i = 1/10 { timer on 2 generate int ab = 100 *a + b // Concatenation anova c ab, noconstant category(ab) timer off 2 drop ab } timer list exit Results: . timer list 1: 24.29 / 10 = 2.4295 2: 7.62 / 10 = 0.7621 . exit end of do-file * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

