[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: RE: st: Computation of Correlation Coefficients - COMPLEX |

Date |
Wed, 07 Nov 2007 13:43:47 -0600 |

John Bunge <jota.be@web.de> writes, > I think I was misunderstood by Bill and want to make my problem more > explicit by stylizing the dataset I have: > > > list > [...] > [...] > The correlation coefficients (cc's) for the decisions I want to compute are: > > between country 1 and 2, between country 1 and 3, ..., between country 1 and > 200,... between country 199 and 200, respectively. The total number of cc's > will be (200*199)/2 = 19,900. > > Now note that I need these coefficients for every single year, not over all > decisions during the whole time period 1980 - 1999. So in the end, I will > have the coefficients for the country-pair 1-2 (and for all other country > pairs, too) for 1980, for 1981, ..., and for 1999. That is, in the end I > will have 19,900*20 = 398,000 coefficients. Okay. I omitted from the quote above the data that John supplied, but here they are: . list +-------------------------+ | cid deid year dec | |-------------------------| 1. | 1 1 1980 -1 | 2. | 1 2 1980 0 | 3. | 1 3 1980 1 | 4. | 1 4 1980 1 | 5. | 1 4000 1999 -1 | |-------------------------| 6. | 2 1 1980 0 | 7. | 2 2 1980 -1 | 8. | 2 3 1980 0 | 9. | 2 4 1980 1 | 10. | 2 4000 1999 -1 | |-------------------------| 11. | 200 1 1980 -1 | 12. | 200 2 1980 0 | 13. | 200 3 1980 0 | 14. | 200 4 1980 -1 | 15. | 200 4000 1999 1 | +-------------------------+ John made clear, here wants correlations of dec between countries, not between years, so this time I'm going to make the dataset wide across countries: . reshape wide dec, i(deid year) j(cid) (note: j = 1 2 200) Data long -> wide --------------------------------------------------------------------- Number of obs. 15 -> 5 Number of variables 4 -> 5 j variable (3 values) cid -> (dropped) xij variables: dec -> dec1 dec2 dec200 --------------------------------------------------------------------- . list +------------------------------------+ | deid year dec1 dec2 dec200 | |------------------------------------| 1. | 1 1980 -1 0 -1 | 2. | 2 1980 0 -1 0 | 3. | 3 1980 1 0 0 | 4. | 4 1980 1 1 -1 | 5. | 4000 1999 -1 -1 1 | +------------------------------------+ Now I can obtain the the correlations, say for 1980, by typing . correlate dec1-dec200 if year==1980 (obs=4) | dec1 dec2 dec200 -------------+--------------------------- dec1 | 1.0000 dec2 | 0.4264 1.0000 dec200 | 0.3015 -0.7071 1.0000 Obviously, if I had all the data, I'd have gotten a much larger correlation matrix. John's about to calculate a lot of correlations. As he said, for each year he will have (200*199)/2 = 19,900 correlations, and for 20 years, he will have a total of 398,000. Perhaps seeing them printed is good enough, but I'm guessing John is next going to ask, "How do I get them in a dataset?" SO let's set about creating a dataset that looks like year dec_i dec_j rho ------------------------------- 1980 1 2 .4264 1980 1 3 ... 1980 1 . . 1980 1 . . 1980 1 200 .3015 1980 2 3 ... 1980 2 . . 1980 2 . . 1980 2 200 -.7071 . . . . . . . . ------------------------------- Here's how: program rhos version 10 postfile results year dec_i dec_j rho using rhos.dta, replace forvalues year = 1980(1)1999 { forvalues i=1(1)200 { local j0 = `i' + 1 forvalues j=`j0'(1)200 { quietly corr dec`i' dec`j' if year==`year' post results (`year') (`i') (`j') (r(rho)) } } } postclose results display as txt "done -- data in rhos.dta" end I haven't tested this program, but it seems to me it ought to work. I do not expect the program to be fast -- we are going to run 398,000 separate -correlate- commands -- but it shouldn't take too long. Run this program on the wide data. Results will be put in rhos.dta. I suggest John package the whole thing as a do-file. I know there will be mistakes -- mine or John's -- and it will be a lot easier to fix the do-file than to keep starting over again interactively: ----------------------------------------- doit.do --- version 10 clear all use johnsdata, clear reshape wide dec, i(deid year) j(cid) program rhos version 10 postfile results year dec_i dec_j rho using rhos.dta, replace forvalues year = 1980(1)1999 { forvalues i=1(1)200 { local j0 = `i' + 1 forvalues j=`j0'(1)200 { quietly corr dec`i' dec`j' if year==`year' post results (`year') (`i') (`j') (r(rho)) } } } postclose results display as txt "done -- data in rhos.dta" end rhos use rhos, clear list in 1/5 ----------------------------------------- doit.do --- -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: question about coefficient testing with xtgee** - Next by Date:
**st: How to reproduce a very specific text book example of logistic regression using Stata** - Previous by thread:
**RE: st: Computation of Correlation Coefficients - COMPLEX** - Next by thread:
**st: Error component models** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |