Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: New version of -somersd- on SSC

From   Roger Newson <>
Subject   st: New version of -somersd- on SSC
Date   Sun, 07 Aug 2005 17:58:38 +0100

Hello All

Thanks to Kit Baum, there is now a new version of the -somersd- package available for download from SSC. In Stata, use the -ssc- command to do this.

-somersd- is described as below on my website and discussed in Newson (2002), and calculates confidence intervals for rank statistics, allowing left-censored and/or right-censored data. The new version adds the following improvements:

1. -somersd- can now be prefixed by -bootstrap:-, -by:-, -jackknife:-, -statsby:-, and -svy jackknife:-.

2, There is now a -funtype()- (functional type) option, allowing the user to specify within-cluster, between-cluster, and overall (or Von Mises) versions of Somers' D and Kendall's tau-a. Previously, only between-cluster versions were available. The parameters behind the Wilcoxon ranksum and signrank tests are between-cluster versions of Somers' D, the parameter behind the sign test is a within-cluster Somers' D, and the Gini coefficient is a Von Mises Somers' D.

3. Cluster frequency weights can be specified using a -cfweight(expression)- option. These cluster frequency weights must be the same for all observations in a cluster, and imply that each cluster represents a number of duplicate clusters equal to its frequency weight. (The manual -somersd.pdf- has an example which demonstrates the usefulness of cluster frequency weights when estimating confidence intervals for Gini coefficients from a dataset with one observation per income group.)

4. There are now options -wstrata(varlist)- and -bstrata(varlist)-, allowing the user to specify stratified versions of Somers' D and Kendall's tau-a. These stratified versions measure correlation within strata specified by the -wstrata()- variables and/or between strata specified by the -bstrata()- variables. For instance, we can measure correlation between an exposure variable and an outcome variable within strata defined by a categorical confounder variable by typing

somersd outcome exposure, wstrata(confoundergroup)

If there are multiple confounders, then they can be used to define a quantitative propensity score for the exposure based on the confounders, and the propensity score can be grouped. If the propensity score is named -propscore-, then we might type

xtile propgroup=propscore, nquant(8)
somersd outcome exposure propscore, wstrata(proproup)
lincom (exposure-propscore)/2

and show that the exposure not only predicts the outcome within propensity groups, but also that, within propensity groups, the exposure predicts the outcome better than the propensity score predicts the outcome.

5. The online help, and the .pdf manual -somersd.pdf-, have been updated to describe and demonstrate these new features.

In the present version of -somersd-, the Mata code is less efficient than it probably could be, given the power of Mata. I plan to improve this in the next version.

Best wishes



Newson R. 2002. Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D and median differences. The Stata Journal 2(1): 45-64. A pre-publication draft is downloadable from my website at

package somersd from

somersd: Kendall's tau-a, Somers' D and median differences

The somersd package contains the programs somersd and cendif, which
calculate confidence intervals for the parameters behind rank or
"nonparametric" statistics. The program somersd calculates confidence
intervals for Kendall's tau-a or Somers' D, and stores the estimates and
their covariance matrix as estimation results. The program cendif calculates
confidence intervals for Hodges-Lehmann median differences (or other
percentile differences) between two groups. Kendall's tau-a is a difference
between probabilities of concordance and discordance, and measures rank
order correlation. Somers' D is a parameter equal to zero under the null
hypothesis tested by the Wilcoxon or Mann-Whitney ranksum test, and can be
used to calculate confidence intervals for Harrell's c index, for areas under
receiver operating characteristic (ROC) curves, and for differences between
Harrell's c indices or ROC areas. The Hodges-Lehmann median difference can be
defined in terms of Somers' D, and is also zero under the null hypothesis
tested by the ranksum test. Full documentation of the two programs (including
methods and formulas) can be found in the ancillary files somersd.pdf and
cendif.pdf, which can be viewed using the Adobe Acrobat Reader.

Author: Roger Newson
Distribution-date: 05 August 2005
Stata-version: 9

INSTALLATION FILES (click here to install)

ANCILLARY FILES (click here to get)
(click here to return to the previous screen)

Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
Division of Asthma, Allergy and Lung Biology
King's College London

5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom

Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
or 020 7848 6605 International +44 20 7848 6605

Opinions expressed are those of the author, not the institution.

* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index