Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Mata - generalised cross products of the form X'SX


From   David Kantor <[email protected]>
To   [email protected]
Subject   Re: st: Mata - generalised cross products of the form X'SX
Date   Tue, 08 Feb 2011 09:29:54 -0500

At 07:12 AM 2/8/2011, Gordon Hughes wrote:
I am starting a new thread, but this is really a follow-up to my post on performance monitoring. Following Bill Gould's helpful suggestion I have identified that most of the time is consumed in loops in my likelihood evaluation that calculate matrices of the form Z=X'SX where S is symmetric but not a diagonal matrix. Of course, I am doing things outside loops where I can.

This is a standard form in any GLS type calculation. They are not directly amenable to use with the Mata function --cross()-- but they could be got into that form by forming by factorising S=QQ', then forming P=Q'X, and finally using P=cross(P,P).

Two questions follow:

A. Is there any Mata function that does this more directly? Maybe Stata Corp might consider extending --cross()-- to handle such cases. The command --matrix glsaccum-- does this in Stata.

B. Roughly, under what conditions might this sequence of steps reduce execution time and/or memory use on the assumption that the dimension of S is large relative to cols(X)? In a related context I have to calculate X'B'BX where B is square but not symmetric and found that use of --cross()-- does not save much (if any) time, though it is probably more efficient on memory. Hence, the overhead of factorisation may not pay off. On the other hand, the initial factorisation would only need to be done once per function call, whereas the steps involving X have to be performed thousands of times per function call.

Gordon Hughes
[email protected]

I don't know enough to answer your question, but the situation looks similar to something I encountered some time ago, which may be valuable to consider. This is about plain Stata code, not Mata.

This is in my mahascore.ado, as part of the mahapick module on SSC. For each observation, I wanted a value computed by first forming a vector `X', based on values of specified variables in the observation, and then computing...
 `X'' * `invcov' * `X' `X'
The "obvious" thing to do was to do that exact computation in a loop, looping over the observations: (Code samples omit the matter of taking the sqrt of the result. Also, these are excerpts that refer to things defined in sections of omitted code.)

local vref "\`v'[\`refobs']"

quietly gen double `gen' = .
tempname X Y

forvalues obsno = 1/`=_N' {
 matrix `X' = J(`nvars', 1, .)
 matrix rownames `X' = `varlist'

 foreach v of local varlist {
  local rownum = rownumb(matrix(`X'), "`v'")
  matrix `X'[`rownum', 1] = `v'[`obsno'] - `vref'
 }

 matrix `Y' = `X'' * `invcov' * `X'  // `Y' is 1x1
 quietly replace `gen' = (scalar(`Y'[1,1])) in `obsno'
}

----
Later, I learned of a much quicker way to get the same result. This is based on the corresponding calculation found in _Dif_mbase in psmatch2 by Leuven & Sianesi. Instead of looping over the observations, the looping is over the rows and columns of `invcov':

local rows : rownames `invcov'
local cols : colnames `invcov'

local icrows = rowsof(`invcov')
local iccols = colsof(`invcov')

local vjref "\`vj'[\`refobs']"
local vkref "\`vk'[\`refobs']"

quietly gen double `gen' = 0
forvalues j = 1 / `icrows' {
 forvalues k = 1 / `icrows' {  /* or iccols -- the same */
  local vj = word("`rows'", `j')
  local vk = word("`rows'", `k') /* or cols -- should be the same */
  quietly replace `gen' = `gen' + ///
   `invcov'[`j,', `k'] * (`vj' - `vjref') * (`vk' - `vkref')
 }
}

----
The latter is mathematically equivalent to the former, but much faster. It gave results that are almost identical to the former; the differences were minuscule. I can't say which was closer to the true result, but I suspect it was the latter -- a question that is similar to a matter that Bill Gould recently wrote about; he may want to chime in here as well.

I hope this is useful or interesting.
--David

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index