# Re: RE: st: Computation of Correlation Coefficients - COMPLEX

 From wgould@stata.com (William Gould, StataCorp LP) To statalist@hsphsun2.harvard.edu Subject Re: RE: st: Computation of Correlation Coefficients - COMPLEX Date Wed, 07 Nov 2007 13:43:47 -0600

```John Bunge <jota.be@web.de> writes,

> I think I was misunderstood by Bill and want to make my problem more
> explicit by stylizing the dataset I have:
>
>
>    list
>        [...]
> [...]
> The correlation coefficients (cc's) for the decisions I want to compute are:
>
> between country 1 and 2, between country 1 and 3, ..., between country 1 and
> 200,...  between country 199 and 200, respectively. The total number of cc's
> will be (200*199)/2 = 19,900.
>
> Now note that I need these coefficients for every single year, not over all
> decisions during the whole time period 1980 - 1999. So in the end, I will
> have the coefficients for the country-pair 1-2 (and for all other country
> pairs, too) for 1980, for 1981, ..., and for 1999. That is, in the end I
> will have 19,900*20 = 398,000 coefficients.

Okay.  I omitted from the quote above the data that John supplied, but here
they are:

. list

+-------------------------+
| cid   deid   year   dec |
|-------------------------|
1. |   1      1   1980    -1 |
2. |   1      2   1980     0 |
3. |   1      3   1980     1 |
4. |   1      4   1980     1 |
5. |   1   4000   1999    -1 |
|-------------------------|
6. |   2      1   1980     0 |
7. |   2      2   1980    -1 |
8. |   2      3   1980     0 |
9. |   2      4   1980     1 |
10. |   2   4000   1999    -1 |
|-------------------------|
11. | 200      1   1980    -1 |
12. | 200      2   1980     0 |
13. | 200      3   1980     0 |
14. | 200      4   1980    -1 |
15. | 200   4000   1999     1 |
+-------------------------+

John made clear, here wants correlations of dec between countries, not between
years, so this time I'm going to make the dataset wide across countries:

. reshape wide dec, i(deid year) j(cid)
(note: j = 1 2 200)

Data                               long   ->   wide
---------------------------------------------------------------------
Number of obs.                       15   ->       5
Number of variables                   4   ->       5
j variable (3 values)               cid   ->   (dropped)
xij variables:
dec   ->   dec1 dec2 dec200
---------------------------------------------------------------------

. list

+------------------------------------+
| deid   year   dec1   dec2   dec200 |
|------------------------------------|
1. |    1   1980     -1      0       -1 |
2. |    2   1980      0     -1        0 |
3. |    3   1980      1      0        0 |
4. |    4   1980      1      1       -1 |
5. | 4000   1999     -1     -1        1 |
+------------------------------------+

Now I can obtain the the correlations, say for 1980, by typing

. correlate dec1-dec200 if year==1980
(obs=4)

|     dec1     dec2   dec200
-------------+---------------------------
dec1 |   1.0000
dec2 |   0.4264   1.0000
dec200 |   0.3015  -0.7071   1.0000

Obviously, if I had all the data, I'd have gotten a much larger correlation
matrix.

John's about to calculate a lot of correlations.  As he said, for each year he
will have (200*199)/2 = 19,900 correlations, and for 20 years, he will
have a total of 398,000.

Perhaps seeing them printed is good enough, but I'm guessing John is next
going to ask, "How do I get them in a dataset?" SO let's set about creating a
dataset that looks like

year    dec_i    dec_j    rho
-------------------------------
1980        1        2    .4264
1980        1        3    ...
1980        1        .    .
1980        1        .    .
1980        1      200    .3015
1980        2        3    ...
1980        2        .    .
1980        2        .    .
1980        2      200   -.7071
.        .        .    .
.        .        .    .
-------------------------------

Here's how:

program rhos
version 10

postfile results year dec_i dec_j rho using rhos.dta, replace
forvalues year = 1980(1)1999 {
forvalues i=1(1)200 {
local j0 = `i' + 1
forvalues j=`j0'(1)200 {
quietly corr dec`i' dec`j' if year==`year'
post results (`year') (`i') (`j') (r(rho))
}
}
}
postclose results
display as txt "done -- data in rhos.dta"
end

I haven't tested this program, but it seems to me it ought to work.

I do not expect the program to be fast -- we are going to run 398,000
separate -correlate- commands -- but it shouldn't take too long.

Run this program on the wide data.  Results will be put in rhos.dta.

I suggest John package the whole thing as a do-file.  I know there will
be mistakes -- mine or John's -- and it will be a lot easier to fix the
do-file than to keep starting over again interactively:

----------------------------------------- doit.do ---
version 10
clear all

use johnsdata, clear
reshape wide dec, i(deid year) j(cid)

program rhos
version 10

postfile results year dec_i dec_j rho using rhos.dta, replace
forvalues year = 1980(1)1999 {
forvalues i=1(1)200 {
local j0 = `i' + 1
forvalues j=`j0'(1)200 {
quietly corr dec`i' dec`j' if year==`year'
post results (`year') (`i') (`j') (r(rho))
}
}
}
postclose results
display as txt "done -- data in rhos.dta"
end

rhos
use rhos, clear
list in 1/5
----------------------------------------- doit.do ---

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```