Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: RE: st: Computation of Correlation Coefficients - COMPLEX


From   wgould@stata.com (William Gould, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: RE: st: Computation of Correlation Coefficients - COMPLEX
Date   Wed, 07 Nov 2007 13:43:47 -0600

John Bunge <jota.be@web.de> writes, 

> I think I was misunderstood by Bill and want to make my problem more
> explicit by stylizing the dataset I have:
>
> 
>    list
>        [...]
> [...]
> The correlation coefficients (cc's) for the decisions I want to compute are:
>
> between country 1 and 2, between country 1 and 3, ..., between country 1 and
> 200,...  between country 199 and 200, respectively. The total number of cc's
> will be (200*199)/2 = 19,900.
>
> Now note that I need these coefficients for every single year, not over all
> decisions during the whole time period 1980 - 1999. So in the end, I will
> have the coefficients for the country-pair 1-2 (and for all other country
> pairs, too) for 1980, for 1981, ..., and for 1999. That is, in the end I
> will have 19,900*20 = 398,000 coefficients.

Okay.  I omitted from the quote above the data that John supplied, but here
they are:

        . list 

             +-------------------------+
             | cid   deid   year   dec |
             |-------------------------|
          1. |   1      1   1980    -1 |
          2. |   1      2   1980     0 |
          3. |   1      3   1980     1 |
          4. |   1      4   1980     1 |
          5. |   1   4000   1999    -1 |
             |-------------------------|
          6. |   2      1   1980     0 |
          7. |   2      2   1980    -1 |
          8. |   2      3   1980     0 |
          9. |   2      4   1980     1 |
         10. |   2   4000   1999    -1 |
             |-------------------------|
         11. | 200      1   1980    -1 |
         12. | 200      2   1980     0 |
         13. | 200      3   1980     0 |
         14. | 200      4   1980    -1 |
         15. | 200   4000   1999     1 |
             +-------------------------+

John made clear, here wants correlations of dec between countries, not between
years, so this time I'm going to make the dataset wide across countries:

        . reshape wide dec, i(deid year) j(cid) 
        (note: j = 1 2 200)

        Data                               long   ->   wide
        ---------------------------------------------------------------------
        Number of obs.                       15   ->       5
        Number of variables                   4   ->       5
        j variable (3 values)               cid   ->   (dropped)
        xij variables:
                                            dec   ->   dec1 dec2 dec200
        ---------------------------------------------------------------------

        . list

             +------------------------------------+
             | deid   year   dec1   dec2   dec200 |
             |------------------------------------|
          1. |    1   1980     -1      0       -1 |
          2. |    2   1980      0     -1        0 |
          3. |    3   1980      1      0        0 |
          4. |    4   1980      1      1       -1 |
          5. | 4000   1999     -1     -1        1 |
             +------------------------------------+

Now I can obtain the the correlations, say for 1980, by typing

        . correlate dec1-dec200 if year==1980
        (obs=4)

                     |     dec1     dec2   dec200
        -------------+---------------------------
                dec1 |   1.0000
                dec2 |   0.4264   1.0000
              dec200 |   0.3015  -0.7071   1.0000

Obviously, if I had all the data, I'd have gotten a much larger correlation 
matrix.

John's about to calculate a lot of correlations.  As he said, for each year he
will have (200*199)/2 = 19,900 correlations, and for 20 years, he will 
have a total of 398,000.

Perhaps seeing them printed is good enough, but I'm guessing John is next
going to ask, "How do I get them in a dataset?" SO let's set about creating a
dataset that looks like

            year    dec_i    dec_j    rho
            -------------------------------
            1980        1        2    .4264
            1980        1        3    ...
            1980        1        .    .   
            1980        1        .    .   
            1980        1      200    .3015
            1980        2        3    ...
            1980        2        .    .
            1980        2        .    .
            1980        2      200   -.7071
               .        .        .    .
               .        .        .    .
            -------------------------------

Here's how:

        program rhos 
            version 10

            postfile results year dec_i dec_j rho using rhos.dta, replace
            forvalues year = 1980(1)1999 {
                forvalues i=1(1)200 { 
                    local j0 = `i' + 1
                    forvalues j=`j0'(1)200 { 
                        quietly corr dec`i' dec`j' if year==`year'
                        post results (`year') (`i') (`j') (r(rho))
                    }
                }
             }
             postclose results
             display as txt "done -- data in rhos.dta"
        end

I haven't tested this program, but it seems to me it ought to work.  

I do not expect the program to be fast -- we are going to run 398,000 
separate -correlate- commands -- but it shouldn't take too long.

Run this program on the wide data.  Results will be put in rhos.dta.

I suggest John package the whole thing as a do-file.  I know there will
be mistakes -- mine or John's -- and it will be a lot easier to fix the 
do-file than to keep starting over again interactively:

        ----------------------------------------- doit.do ---
        version 10
        clear all

        use johnsdata, clear 
        reshape wide dec, i(deid year) j(cid) 

        program rhos 
            version 10

            postfile results year dec_i dec_j rho using rhos.dta, replace
            forvalues year = 1980(1)1999 {
                forvalues i=1(1)200 { 
                    local j0 = `i' + 1
                    forvalues j=`j0'(1)200 { 
                        quietly corr dec`i' dec`j' if year==`year'
                        post results (`year') (`i') (`j') (r(rho))
                    }
                }
             }
             postclose results
             display as txt "done -- data in rhos.dta"
        end

        rhos
        use rhos, clear 
        list in 1/5
        ----------------------------------------- doit.do ---

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index