Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Changing -correlate- results


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Changing -correlate- results
Date   Mon, 21 Mar 2005 12:13:51 -0000

My guess is that the underlying problem 
is what is done with tied data. 

Suppose you -sort- according to some -x-. 
If the 12th, 13th, 14th values of -x- are 
42, 42, 42 you are assigning them ranks of 12, 13, 14. 
This will happen after every -sort- but 
_which_ observation is ranked 12th is not 
guaranteed to be the same. -sort- shakes up 
the data first. From the point 
of view of -sort-ing -x- alone, this is 
immaterial, but it does make a difference
when -y- is also considered. Only if 
the observations tie on both -x- and -y- 
will it be irrelevant. 

Otherwise put, 

* The correct procedure given the desire
for a rank is to use -egen, rank()- 
and so guarantee that tied values get 
identical ranks. 

* Your "data" are a bit different every 
time, so the correlations
are not surprisingly a little different. 

Nick 
[email protected] 

louis boakye-yiadom

> A -correlate- command in my do file gives slightly different 
> results each 
> time the file is executed. I'm baffled by this, and I don't find any 
> explanation from the -help- for -correlate-. I've provided 
> below, three of 
> such different results, as well as, excerpts of the do file. 
> Any help will 
> be appreciated. Thank you.
> 
> 
> . correlate rankf1 rankf2;
> (obs=97)
> 
>              |   rankf1   rankf2
> -------------+------------------
>       rankf1 |   1.0000
>       rankf2 |   0.7524   1.0000
> 
> 
> . correlate rankf1 rankf2;
> (obs=97)
> 
>              |   rankf1   rankf2
> -------------+------------------
>       rankf1 |   1.0000
>       rankf2 |   0.7489   1.0000
> 
> 
> . correlate rankf1 rankf2;
> (obs=97)
> 
>              |   rankf1   rankf2
> -------------+------------------
>       rankf1 |   1.0000
>       rankf2 |   0.7443   1.0000
> 
> 
> use isocomb_indices, clear;
> sort f1;
> list region district dmrd dmkt dpt djss dchp f1, clean header(50);
> gen rankf1=_n;
> sort f2;
> list region district pwrd pwmkt1 pwmkt2 pwjss pwchp f2, clean 
> header(50);
> gen rankf2=_n;
> bysort region: list region district nc f1 rankf1 f2 rankf2, 
> clean mean(f1 
> rankf1 f2 rankf2);
> correlate rankf1 rankf2;
> drop rankf1 rankf2;
> sort region district;
> save isocomb_indices, replace;
> clear;
> log close;
> exit;

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index