# st: Rankit, pearson and polychoric correlations [was: Ordinal tointerval assuming normality]

 From Michael Ingre <[email protected]> To <[email protected]> Subject st: Rankit, pearson and polychoric correlations [was: Ordinal tointerval assuming normality] Date Mon, 20 Oct 2003 17:47:17 +0200

```Thank you Nick Cox, your neat suggestion -almost- (I think) did the trick.

> . sysuse auto
> . ssc inst egenmore
> . egen ridit = ridit(rep78)
> . gen pseudogauss = invnorm(ridit)
> . tabdisp rep78, c(ridit pseudogauss)

But, wouldn't the estimation using ridit scores (mean rank / sample size)
bias z slightly towards zero when you have many observations/category? That
is, z-scores would increase more to the left than decrease to the right of
the rank mean within a category (on the left hand size of the distribution).

> Is this not (related to) the
> rankit transformation of Fisher and Yates?

Yes, I think you are right. And I tried to find out more about it however,
this procedure is not mentioned in any textbook that I could find. Searching
the net gave me a few hits. It is close to your suggestion:

rankit = invnorm( (mean rank-0.375)/sample size+0.25) )

Polychoric correlations (PC) is related to this issue as well and this was
pointed out by Bill Magee (off-list). Learning about PC was what prompted me
towards this question in the first place.

PCs could be estimated with LISREL (PRELIS) and SAS. In PC, the calculation
is based on thresholds between categories assuming a -bivariate- normal
distribution. If I understand PC correct however, the thresholds are
recalculated for every bivariate PC. They are likely vary slightly between
PCs and would be problematic in multivariate procedures (even in LISREL) and
when comparing estimates between multiple bivariate analyses.

The calculation of thresholds should however, yield unbiased z-scores:

threshold = invnorm(max unique rank/category [or cumulative freq]/sample
size)

The above formula correctly reproduces the -univariate- thresholds
calculated by PRELIS. However, it is not usable as is, because there is no
valid threshold for the highest category - invnorm(1).

My questions
--------------
I'm seeking an alternative to PCs that would enable me to use standard
statistical procedures (pearson, anova, regress etc.) and get correct
estimates of the latent variable assuming a normal distribution.

Nick pointed out that the assumption of normality is often violated. This is
true in my case also. However, in a large population sample (like all twins
in Sweden) with low drop out this would seem to me as an appropriate
approximation. Although, one would expect a slightly higher drop out in some
categories (for example poor subjective health).

1) The invnorm(ridit) and the similar rankit procedure seems to be a
possible solution. But -rankit- is not very well known or used. Why, am I
missing something important?

2) Just to confirm, am I right in assuming a slight bias in z in the rankit
procedure?

3) The slight bias in z could be acceptable but is there a way of correcting
for this bias?

4) What does the constants do in the rankit procedures?

5) PCs could be used in LISREL however, the manual states that this violates
the assumption behind ML and estimation has to be done with WLS. (I think
this might be due to the inconsistencies in the covariance matrix). But,
what about using rankit (or similar) transformations, treat them as
continuous and use ML?

Thank you all for you patience. And feel free to comment or answer one or
more questions.

Michael Ingre

-----------------
PhD-student
Institution for Psychology
Stockholm University
National Institute for
Psychoscial Medicine

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

• Follow-Ups: