Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: New versions of -qqvalue- and -smileplot- on SSC

From	"Roger B. Newson" <[email protected]>
To	"[email protected]" <[email protected]>, [email protected]
Subject	st: New versions of -qqvalue- and -smileplot- on SSC
Date	Thu, 11 Oct 2012 15:16:54 +0100

Thanks as always to Kit Baum, new versions of the packages -qqvalue- and-smileplot- are available for download from SSC. In Stata, use the -ssc-command to do this, or -adoupdate- if you already have old versions of-qqvalue- and -smileplot-.

The -qqvalue- and -smileplot- packages are described as below on mywebsite, and implement selections of frequentist multiple-testprocedures, inputting a variable containing P-values and outputtingq-values and discovery sets, respectively. Most statisticians nowadayswould argue that q-values are more informative than discovery sets.However, discovery sets have been implemented in -smileplot- for a fewrarely-used multiple-test procedures, for which q-values are notavailable in -qqvalue-.

The new version of -qqvalue- fixes a problem with the Sidak andHolland-Copenhaver procedures, which caused them to output zero q-valuesfor P-values that were so small that, when subtracted from 1 in doubleprecision, they gave a result of 1. This has been fixed by using theBonferroni procedure as a substitute for the Sidak procedure, and theHolm procedure as a substitute for the Holland-Copenhaver procedure, tocompute q-values for such tiny P-values. This procedure works because inthe limit, as the input P-value tends to zero, the output Sidak q-valueconverges in ratio to the output Bonferroni q-value, and the outputHolland-Copenhaver q-value converges in ratio to the output Holmq-value. I would like to thank Tiago Pereira for drawing our attentionto this issue of tiny P-values on Statalist. See


http://www.stata.com/statalist/archive/2012-03/msg00726.html

for more about this correspondence.

The new version of -smileplot- "fixes" a similar problem with the Sidakand Holland-Copenhaver procedures when calculating critical P-values forgenerating discovery sets. In this case, the problem is that, for aninput P-value p and a number m of multiple comparisons, the quantity


(1-p)^(1/m)

can sometimes be computed in double precision to give a result of 1,either because p is tiny or because m is very large. I have againsubstituted the Bonferroni and Holm formulas for the Sidak andHolland-Copenhaver formulas for these cases. If the problem is a tinyinput P-value, then this should be a satisfactory solution, because theconvergence in ratio still applies. However, if the problem is a huge mwithout a tiny p, then this solution will produce a conservativecorrected critical P-value, as the convergence in ratio does not applyas m tends to infinity, in the way that it does as p tends to zero. Onthe other hand, the corrected critical P-value will be less than thevalue of zero that -smileplot- previously produced in this case. Thisissue seems to me to be one more reason for preferring q-values todiscovery sets.

I am considering submitting a brief Stata Journal article on thisprecision issue with the Sidak and Holland-Copenhaver procedures, whichis potentially a trap for unsuspecting genome scanners.


Best wishes

Roger


-----------------------------------------------------------------------------------
package qqvalue from http://www.imperial.ac.uk/nhli/r.newson/stata10
-----------------------------------------------------------------------------------

TITLE

qqvalue: Generate frequentist q-values by inverting multiple-testprocedures


DESCRIPTION/AUTHOR(S)
      qqvalue is similar to the R package p.adjust.  It inputs a single
      variable, assumed to contain P-values calculated for multiple
      comparisons, in a dataset with 1 observation per comparison.  It
      outputs a new variable, containing the q-values corresponding to
      these P-values, calculated by inverting a multiple-test procedure
      specified by the user.  These q-values represent, for each
      corresponding P-value, the minimum uncorrected P-value threshold
      for which that P-value would be in the discovery set, assuming that
      the specified multiple-test procedure was used on the same set of
      input P-values to generate a corrected P-value threshold.  These
      minimum uncorrected P-value thresholds may represent familywise
      error rates or false discovery rates, depending on the procedure
      used.  Optionally, qqvalue may output other variables, containing
      the various intermediate results used in calculating the
      q-values.  The multiple-test procedures available for
      qqvalue are a subset of those available using the multproc module
      of the smileplot package, which can be downloaded from SSC.

      Author: Roger Newson
      Distribution-Date: 08October2012
      Stata-Version: 10

INSTALLATION FILES                                  (click here to install)
      qqvalue.ado
      qqvalue.sthlp
-----------------------------------------------------------------------------------
(click here to return to the previous screen)

-----------------------------------------------------------------------------------
package smileplot from http://www.imperial.ac.uk/nhli/r.newson/stata10
-----------------------------------------------------------------------------------

TITLE
      smileplot: Multiple test procedures and smile plots

DESCRIPTION/AUTHOR(S)

This package contains the programs multproc, smileplot andsmileplot7.multproc inputs a data set with 1 observation for each of a setof multiplesignificance tests and data on the P-values, and carries out amultiple testprocedure chosen by the user to define a corrected overallcritical P-valuefor accepting or rejecting the null hypotheses tested. Theseproceduresmay be one-step, step-up or step-down, and may control thefamilywise errorrate (eg the Bonferroni, Sidak, Holm, Holland-Copenhaver,Hochberg and Rom

      procedures) or the false discovery rate (eg the Simes, Benjamini-Liu,

Benjamini-Yekutieli and Benjamini-Krieger-Yekutieli procedures).smileplot,

      and its Stata 7 version smileplot7, work by calling multproc and then

creating a smile plot, with data points corresponding to multipleestimatedparameters, the P-values (on a reverse log scale) on the Y-axis,and thecorresponding parameter estimates (or another variable) on theX-axis. Thereare Y-axis reference lines at the uncorrected and correctedoverall criticalP-values. The reference line at the corrected critical P-value,known as theparapet line, is interpreted informally as a boundary betweendata mining anddata dredging. multproc, smileplot and smileplot7 are used ondata sets withone observation per estimated parameter and data on estimates andtheirP-values, which may be created using parmby, parmest, statsby orpostfile.


      Author: Roger Newson
      Distribution-Date: 09october2012
      Stata-Version: 10

INSTALLATION FILES                                  (click here to install)
      multproc.ado
      multproc.sthlp
      smileplot.ado
      smileplot.sthlp
      smileplot7.ado
      smileplot7.sthlp
-----------------------------------------------------------------------------------
(click here to return to the previous screen)

--
Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/

Opinions expressed are those of the author, not of the institution.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: gamma
Next by Date: Re: st: ordered logistic regression with endogenous variable
Previous by thread: st: encode command
Next by thread: st: Sums and means for each decile
Index(es):
- Date
- Thread