[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

# st: RE: Changing -correlate- results

 From "Nick Cox" To Subject st: RE: Changing -correlate- results Date Mon, 21 Mar 2005 12:13:51 -0000

```My guess is that the underlying problem
is what is done with tied data.

Suppose you -sort- according to some -x-.
If the 12th, 13th, 14th values of -x- are
42, 42, 42 you are assigning them ranks of 12, 13, 14.
This will happen after every -sort- but
_which_ observation is ranked 12th is not
guaranteed to be the same. -sort- shakes up
the data first. From the point
of view of -sort-ing -x- alone, this is
immaterial, but it does make a difference
when -y- is also considered. Only if
the observations tie on both -x- and -y-
will it be irrelevant.

Otherwise put,

* The correct procedure given the desire
for a rank is to use -egen, rank()-
and so guarantee that tied values get
identical ranks.

* Your "data" are a bit different every
time, so the correlations
are not surprisingly a little different.

Nick
n.j.cox@durham.ac.uk

louis boakye-yiadom

> A -correlate- command in my do file gives slightly different
> results each
> time the file is executed. I'm baffled by this, and I don't find any
> explanation from the -help- for -correlate-. I've provided
> below, three of
> such different results, as well as, excerpts of the do file.
> Any help will
> be appreciated. Thank you.
>
>
> . correlate rankf1 rankf2;
> (obs=97)
>
>              |   rankf1   rankf2
> -------------+------------------
>       rankf1 |   1.0000
>       rankf2 |   0.7524   1.0000
>
>
> . correlate rankf1 rankf2;
> (obs=97)
>
>              |   rankf1   rankf2
> -------------+------------------
>       rankf1 |   1.0000
>       rankf2 |   0.7489   1.0000
>
>
> . correlate rankf1 rankf2;
> (obs=97)
>
>              |   rankf1   rankf2
> -------------+------------------
>       rankf1 |   1.0000
>       rankf2 |   0.7443   1.0000
>
>
> use isocomb_indices, clear;
> sort f1;
> list region district dmrd dmkt dpt djss dchp f1, clean header(50);
> gen rankf1=_n;
> sort f2;
> list region district pwrd pwmkt1 pwmkt2 pwjss pwchp f2, clean
> header(50);
> gen rankf2=_n;
> bysort region: list region district nc f1 rankf1 f2 rankf2,
> clean mean(f1
> rankf1 f2 rankf2);
> correlate rankf1 rankf2;
> drop rankf1 rankf2;
> sort region district;
> save isocomb_indices, replace;
> clear;
> log close;
> exit;

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

 © Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index