Date   Fri, 4 Sep 2009 12:20:27 +0100

I presume that Martin is referring to the rank biserial correlation coefficient of Cureton (1956). This has an alternative name, namely Somers' D of the ordinal variable with respect to the dichotomous variable, or D(Y|X), where Y is the ordinal variable and X is the dichotomous variable. The identity of the 2 parameters (and of their corresponding sample statistics) is proved rigorously in Newson (2008).

Confidence intervals for Somers' D in all its forms can be computed in Stata using the -somersd- package, which you can download from SSC. In Stata, type

ssc desc somersd

to describe it, and

ssc inst somersd, replace

to install it. Note that you need to have Stata Version 10 or above to use the latest version. Earlier versions are downloadable from my website by typing, in Stata,

net from

and selecting the version for your Stata.

Once you have installed -somersd-, it may be a good idea to exit Stata and then to start Stata again, because some versions of Stata have a problem with newly-installed packages that contain Mata libraries (as -somersd- does). However, to use -somersd-, get your data into the memory, and type

somersd x y, transf(z) tdist

where x is the dichotomous variable and y is the ordinal variable. You should then get an asymmetric confidence interval for Somers' D, aka the rank biserial correlation coefficient. The -somersd- package comes with extensive on-line help, and also a set of .pdf manuals with methods, formulas and examples.

I hope this helps.

Best wishes



Cureton EE. Rank-biserial correlation. Psychometrika 1956; 21: 287{290.

Newson R. Identity of Somers' D and the rank biserial correlation coeffi±cient. 21 February, 2008. Unrefereed document downloadable from
as of today.

dear all,

I would like to calculate a rank biserial correlation coefficient between dichotomous variables (e.g. application of a specific method in the aanalysis; yes/no) and ordinal variables (satisfaction with results of the analysis; five-point likert scale).

to my knowledge, rank biserial should do the job and can be fairly easy calculated in excel.

however, since I'd like to analyse the correlation between many, many different variables I would prefer an 'automatic' solution in stata rather than doing it manually  in excel.

any idea?

thanks - any help it is highly appreciated!



