Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Computation of Correlation Coefficients - COMPLEX


From   [email protected] (William Gould, StataCorp LP)
To   [email protected]
Subject   Re: st: Computation of Correlation Coefficients - COMPLEX
Date   Wed, 07 Nov 2007 09:32:33 -0600

John Bunge <[email protected]> wrote, 

> I want to compute correlation coefficients within the following setting:
>
> I have the variables CID (No. of country; 1-200), DEID (No. of decision;
> 1-4000), YEAR and DEC (no, abstain, yes; expressed as: -1,0,1).
>
> There were several decisions per year in which more or less all the
> countries took part.
> 
> I want to compute the correlation in these decisions between every country
> pair for all single years.

I think John as a dataset that looks something like this, 

        . list

             +-------------------------+
             | cid   deid   year   dec |
             |-------------------------|
          1. |   1      1   1990     1 |
          2. |   1      2   1990     0 |
          3. |   1      1   1991     0 |
          4. |   1      2   1991    -1 |
          5. |   1      1   1992     0 |
             |-------------------------|
          6. |   1      2   1992     1 |
          7. |   2      1   1990     0 |
          8. |   2      2   1990    -1 |
          9. |   2      1   1991     1 |
         10. |   2      2   1991     1 |
             |-------------------------|
         11. |   2      1   1992    -1 |
         12. |   2      2   1992     0 |
             +-------------------------+

As I understand it, John wants to correlate dec in 1990 with 1991, 1990 with
1992, etc., matching decisions on (cid, deid).

The answer is, of course, -correlate-, but -correlate- correlates variables 
in the same observation.  So I need a dataset with dec in 1990, 1991, and 1992
in the same observation.  The first step is to convert the data to the wide
form:


        . reshape wide dec, i(cid deid) j(year)
        (note: j = 1990 1991 1992)

        Data                               long   ->   wide
        ---------------------------------------------------------------------
        Number of obs.                       12   ->       4
        Number of variables                   4   ->       5
        j variable (3 values)              year   ->   (dropped)
        xij variables:
                                            dec   ->   dec1990 dec1991 dec1992
        ---------------------------------------------------------------------

Now the data look like this,

        . list
        
             +------------------------------------------+
             | cid   deid   dec1990   dec1991   dec1992 |
             |------------------------------------------|
          1. |   1      1         1         0         0 |
          2. |   1      2         0        -1         1 |
          3. |   2      1         0         1        -1 |
          4. |   2      2        -1         1         0 |
             +------------------------------------------+

and I can obtain the correlations by typing 

        . corr dec*
        (obs=4)

                     |  dec1990  dec1991  dec1992
        -------------+---------------------------
             dec1990 |   1.0000
             dec1991 |  -0.4264   1.0000
             dec1992 |   0.0000  -0.8528   1.0000


-- Bill
[email protected]

P.S.  John also wrote, 

      > two days ago I posted a query, unfortunately there came no reply
      > on it.

      and added, "If the problem is not expressed clearly, please give me
      advice".

      I would like to do just that, not only for John, but for others who ask
      questions that do not receive an answer.

      In this case, John asked the question too concisely.  John had an 
      excellent summary of his problem, but didn't go the extra step of 
      including an example to make it easy for me to answer his question.
      Instead, I HAD TO CONCOCT THE EXAMPLE and I spent more time doing that
      than actually answering the question.

      Questioners, understand:  For those of us answering questions, 
      the satisfaction is in the answering.  We are loath working on 
      the asking part.

      John had a dynamite opening.  I love it when questions are concise, 
      because then I can quickly decide whether I have anything to 
      contribute.  To make it more likely John received an answer, however,
      John then needed to continue to set the problem up for me.
      Give me a small example.  Make everything explicit so that then, all 
      I have to do is say, type this.

      After John's concise intro, he could have added, 

          For instance, here's a small dataset with 2 countries, 3 years, 
          and 2 decisions:

          <insert listing here>

          What I want is the correlation of decisions in 1990 and 1991, 1990
          and 1992, and 1991 and 1992, calculating the correlation across
          country.  For instance, the correlation in 1990 and 1991 would be
          based on the correlation of

                    dec in 1990    dec in 1991
                    --------------------------
                          1            0     <- from obs 1 & 3; cid=1, deid=1
                          0           -1     <- from obs 2 & 4; cid=1, deid=2
                          0            1     <- etc...
                         -1            1     

      Remember, when asking a question, you are playing on our sympathies 
      and our desire to show off.  Those who answer cannot help but more 
      sympathetic when it appears you have invested time in formulating 
      the question.

      I hope this is helpful.

<end>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index