Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: New versions of -qqvalue- and -smileplot- on SSC

From   "Roger B. Newson" <>
To   "" <>,
Subject   st: New versions of -qqvalue- and -smileplot- on SSC
Date   Thu, 11 Oct 2012 15:16:54 +0100

Thanks as always to Kit Baum, new versions of the packages -qqvalue- and -smileplot- are available for download from SSC. In Stata, use the -ssc- command to do this, or -adoupdate- if you already have old versions of -qqvalue- and -smileplot-.

The -qqvalue- and -smileplot- packages are described as below on my website, and implement selections of frequentist multiple-test procedures, inputting a variable containing P-values and outputting q-values and discovery sets, respectively. Most statisticians nowadays would argue that q-values are more informative than discovery sets. However, discovery sets have been implemented in -smileplot- for a few rarely-used multiple-test procedures, for which q-values are not available in -qqvalue-.

The new version of -qqvalue- fixes a problem with the Sidak and Holland-Copenhaver procedures, which caused them to output zero q-values for P-values that were so small that, when subtracted from 1 in double precision, they gave a result of 1. This has been fixed by using the Bonferroni procedure as a substitute for the Sidak procedure, and the Holm procedure as a substitute for the Holland-Copenhaver procedure, to compute q-values for such tiny P-values. This procedure works because in the limit, as the input P-value tends to zero, the output Sidak q-value converges in ratio to the output Bonferroni q-value, and the output Holland-Copenhaver q-value converges in ratio to the output Holm q-value. I would like to thank Tiago Pereira for drawing our attention to this issue of tiny P-values on Statalist. See

for more about this correspondence.

The new version of -smileplot- "fixes" a similar problem with the Sidak and Holland-Copenhaver procedures when calculating critical P-values for generating discovery sets. In this case, the problem is that, for an input P-value p and a number m of multiple comparisons, the quantity


can sometimes be computed in double precision to give a result of 1, either because p is tiny or because m is very large. I have again substituted the Bonferroni and Holm formulas for the Sidak and Holland-Copenhaver formulas for these cases. If the problem is a tiny input P-value, then this should be a satisfactory solution, because the convergence in ratio still applies. However, if the problem is a huge m without a tiny p, then this solution will produce a conservative corrected critical P-value, as the convergence in ratio does not apply as m tends to infinity, in the way that it does as p tends to zero. On the other hand, the corrected critical P-value will be less than the value of zero that -smileplot- previously produced in this case. This issue seems to me to be one more reason for preferring q-values to discovery sets.

I am considering submitting a brief Stata Journal article on this precision issue with the Sidak and Holland-Copenhaver procedures, which is potentially a trap for unsuspecting genome scanners.

Best wishes


package qqvalue from

qqvalue: Generate frequentist q-values by inverting multiple-test procedures

      qqvalue is similar to the R package p.adjust.  It inputs a single
      variable, assumed to contain P-values calculated for multiple
      comparisons, in a dataset with 1 observation per comparison.  It
      outputs a new variable, containing the q-values corresponding to
      these P-values, calculated by inverting a multiple-test procedure
      specified by the user.  These q-values represent, for each
      corresponding P-value, the minimum uncorrected P-value threshold
      for which that P-value would be in the discovery set, assuming that
      the specified multiple-test procedure was used on the same set of
      input P-values to generate a corrected P-value threshold.  These
      minimum uncorrected P-value thresholds may represent familywise
      error rates or false discovery rates, depending on the procedure
      used.  Optionally, qqvalue may output other variables, containing
      the various intermediate results used in calculating the
      q-values.  The multiple-test procedures available for
      qqvalue are a subset of those available using the multproc module
      of the smileplot package, which can be downloaded from SSC.

      Author: Roger Newson
      Distribution-Date: 08October2012
      Stata-Version: 10

INSTALLATION FILES                                  (click here to install)
(click here to return to the previous screen)

package smileplot from

      smileplot: Multiple test procedures and smile plots

This package contains the programs multproc, smileplot and smileplot7. multproc inputs a data set with 1 observation for each of a set of multiple significance tests and data on the P-values, and carries out a multiple test procedure chosen by the user to define a corrected overall critical P-value for accepting or rejecting the null hypotheses tested. These procedures may be one-step, step-up or step-down, and may control the familywise error rate (eg the Bonferroni, Sidak, Holm, Holland-Copenhaver, Hochberg and Rom
      procedures) or the false discovery rate (eg the Simes, Benjamini-Liu,
Benjamini-Yekutieli and Benjamini-Krieger-Yekutieli procedures). smileplot,
      and its Stata 7 version smileplot7, work by calling multproc and then
creating a smile plot, with data points corresponding to multiple estimated parameters, the P-values (on a reverse log scale) on the Y-axis, and the corresponding parameter estimates (or another variable) on the X-axis. There are Y-axis reference lines at the uncorrected and corrected overall critical P-values. The reference line at the corrected critical P-value, known as the parapet line, is interpreted informally as a boundary between data mining and data dredging. multproc, smileplot and smileplot7 are used on data sets with one observation per estimated parameter and data on estimates and their P-values, which may be created using parmby, parmest, statsby or postfile.

      Author: Roger Newson
      Distribution-Date: 09october2012
      Stata-Version: 10

INSTALLATION FILES                                  (click here to install)
(click here to return to the previous screen)

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Web page:
Departmental Web page:

Opinions expressed are those of the author, not of the institution.
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index