Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: New version of -somersd- on SSC

From   "Roger B. Newson" <>
To   "" <>
Subject   st: New version of -somersd- on SSC
Date   Wed, 29 Feb 2012 12:02:01 +0000

Thanks once again to Kit Baum, a new version of the -somersd- package is now available for download from SSC. In Stata, use the -ssc- command to do this, or -adoupdate- if you already have an old version of -somersd-.

The -somersd- package is described as below on my website. The new version has an improved version of the -cendif- module, which estimates median and other percentile differences between 2 groups of observations. -cendif- now estimates the 0th and 100th percentile differences, as well as the percentile differences in between. I have also added code to make -cendif- fail if any of the individual pairwise differences are missing, as happens in the case of value overflow. And I have tidied up the code to make -cendif- more efficient. In particular, if the user uses the module -sccendif- of the -scsomersd- package (a front end for -cendif-) to estimate percentiles (as distinct from percentile differences), then the computational time will now be linear in the number of observations, not quadratic in the number of observations as below.

So, if the user types

sysuse auto, clear
gene firm=word(make,1)
sccendif price 0, ce(0(5)100) tdist cluster(firm)

then -sccendif- will use -cendif- to produce a list of percentiles of -price- from the 0th by 5 percent to the 100th, with confidence limits estimated assuming that we are sampling car firms from a population of car firms, instead of sampling car models from a population of car models. And the time taken to produce these confidence intervals will be linear in the number of observations, not quadratic in the number of observations as before, which can be important if the user wants to calculate confidence intervals for percentiles (weighted and/or clustered) in large datasets. This should make -sccendif- a lot faster at calculating confidence intervals for percentiles than -sccenslope-, even in large datasets.

Ideally, I would have re-written -cendif- mostly in Mata, which would have made it even more efficient. However, I do not at present have the time, so I hope to revisit this issue when I do.

I would like to thank Bill Gould for the very helpful clarification that he gave yesterday about Stata missing values, the Stata real line, and the behavior of Stata when computations produce an overflowing value, in response to my query about these issues arising from work on -cendif-.

Best wishes


Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Web page:
Departmental Web page:

Opinions expressed are those of the author, not of the institution.

package somersd from

      somersd: Kendall's tau-a, Somers' D and percentile slopes

The somersd package contains the programs somersd, censlope and cendif,
      which calculate confidence intervals for a range of parameters behind
      rank or "nonparametric" statistics. somersd calculates confidence
      intervals for generalized Kendall's tau-a or Somers' D parameters,
and stores the estimates and their covariance matrix as estimation results.
      It can be used on left-censored, right-censored, clustered and/or
stratified data. censlope is an extended version of somersd, which also calculates confidence limits for the generalized Theil-Sen median slopes (or other percentile slopes) corresponding to the version of Somers' D
      or Kendall's tau-a estimated. cendif is an easy-to-use program to
      calculate confidence intervals for Hodges-Lehmann median differences
(or other percentile differences) between two groups. The somersd package
      can be used to calculate confidence intervals for a wide range of
      rank-based parameters, which are special cases of Kendall's tau-a,
      Somers' D or percentile slopes. These parameters include differences
between proportions, Harrell's c index, areas under receiver operating characteristic (ROC) curves, differences between Harrell's c indices or
      ROC areas, Gini coefficients, population attributable risks, median
      differences, ratios, slopes and per-unit ratios, and the parameters
      behind the sign test and the Wilcoxon-Mann-Whitney or Breslow-Gehan
ranksum tests. Full documentation of the programs (including methods and formulas) can be found in the manual files somersd.pdf, censlope.pdf and
      cendif.pdf, which can be viewed using the Adobe Acrobat Reader.

      Author: Roger Newson
      Distribution-date: 28february2012
      Stata-version: 10

INSTALLATION FILES                                  (click here to install)

ANCILLARY FILES                                     (click here to get)
(click here to return to the previous screen)
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index