Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Mata - generalised cross products of the form X'SX

From	Gordon Hughes <[email protected]>
To	[email protected]
Subject	Re: st: Mata - generalised cross products of the form X'SX
Date	Wed, 09 Feb 2011 10:40:10 +0000

To save space I haven't reflected all of Bill Gould's response to myoriginal posting, but I would like to pick up on the question ofoptimisation with respect to different ways of doing the samething. I understand that hardware, the nature of the problem, etcmatter, but others may be interested in what I have found. Asbackground I am running Stata/MP for 2 cores under Windows XP (32bit) and Macintosh OS-X 10.6.6 (64 bit). The machines arerespectively Intel Quad Core (XP) and Intel 2 Core (Mac) with 4 GB ofRAM in each case. I run into memory limitations under XP and timelimitations on the Mac. My research problem is spatial paneleconometrics involving quite large N and T, but my test runs hit theconstraints at sample sizes much smaller than I need for realwork. For spatial panel data, the calculations are structured as thesum over T of what are, in effect, GLS estimations over panel unitsso nrows = ncols in many cases because of the repeated application ofspatial weights.

I have implemented most of the optimisations by hand includingcalculating X'B'BX as T'*T where T=B*X. Trying to go further:

1. I replaced --T'*T -- by --cross(T,T)--. There is no significantreduction in execution time. For a majority of cases execution takeslonger with cross(), but this is not always the case. My experienceis that the overhead for small problems is greater than the increasein speed for large problems, so that the gain does not justify thepotential penalty.

2. Going further, suppose S is positive definite. Then, I foundthat applying the Cholesky factorisation S=LL' and using the variantX'SX=T'*T where T=L'*X pays off if you have to do the factorisationfor other purposes (e.g. prior inversion) but is any time gains arevery marginal otherwise.

3. Suppose S is square but not guaranteed to be positive definite(unfortunately my case), then the LU factorisation S=PLU means thatX'SX=(X'*PL)*(U*X). My experience in this case is that there is avery definite time penalty from using this formulation rather thanjust using X'*S*X.

Unfortunately without the ability to monitor memory use I don't knowwhether --cross()-- or some of the other formulations aresignificantly more efficient in memory use. For anyone wonderingwhat this is all about, the largest of my test problems involves upto 4,500 observations for 225 panel units and 20 time periods. Myreal data is up to 500 panel units and 50 or more time periods. Thelargest problem takes 4 hours to run on my Mac and the time requiredfor a run goes up exponentially with the number of time periods -going from 8 to 20 increases execution time by more 50 times. Theprogram spends nearly 80% of execution time in one function, so thatthere are huge gains from optimising this function, but I suspectthat I have got to the point where I need to go back to using muchlarger distributed computing power.


Gordon Hughes
[email protected]

Date: Tue, 08 Feb 2011 09:37:14 -0600
From: [email protected] (William Gould, StataCorp LP)
Subject: Re: st: Mata - generalised cross products of the form X'SX

Gordon Hughes <[email protected]> writes,

> B.  Roughly, under what conditions might this sequence of steps
>     reduce execution time and/or memory use on the assumption
>     that the dimension of S is large relative to cols(X)?

I'm sorry, I don't have an answer, except to say that the answer will
depend on whether you are using Stata/MP because MP is parallelizes
matrix multiplication, and thus performs it faster than does
Stata/SE.  In cases were rows>>cols or cols>>rows, multiplication
time approaches theoretical limits, which is to say, execution time
becomes proportional to 1/cores.

> In a related context I have to calculate X'B'BX where B is square
> but not symmetric and found that use of --cross()-- does not save
> much (if any) time, though it is probably more efficient on memory.

Sometimes the Mata optimizer is not as smart as we wish that it were.
To calculate X'B'B'X as quickly as possible, code

                T = B'X
                R = T'T

That will execute more quickly than R = X'B'B'X.  When you code
X'B'B'X, Mata does not notice that X'B'B'x = (B'X)'(B'X).  Calculation
of X'B'B'X results in Mata proforming three transposes and four matrix
multiplications.

....

In addition, use parenthesis where necessary to force the order of the
operations.  Concerning order of operations, Mata does not know what
you know, namely that rows >> cols.  Mata makes all of its
optimization decisions at compile time, and therefore can can choose
inefficient order of operations.  In this case, for memory
effinciency reasons, you do not want to calculate B'B to get to
X'B'B'X, you want to calculate X'B and B'X.  That right there will
make a differece in execution time, even without exploiting X'B
= (B'X)', which can be used additionally to save an entire multiplication.

- -- Bill
[email protected]
*


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Sample size calculation under poisson distribution
Next by Date: Re:Re:st:Stata and survey data
Previous by thread: Re: st: Mata - generalised cross products of the form X'SX
Next by thread: st: egen with error after set obs erases obs
Index(es):
- Date
- Thread