Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: AW: AW: st: Missing confidence intervals for median after using -bootstrap- or -bpmedian-
From 
 
"Roger B. Newson" <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: AW: AW: st: Missing confidence intervals for median after using -bootstrap- or -bpmedian- 
Date 
 
Fri, 16 Nov 2012 12:05:12 +0000 
One immediate correction. The -censlope- and -cendif- modules do not 
estimate differences between medians, they estimate median pairwise 
differences, which are not always the same thing, because a median is 
not like a mean. (A lot of people are surprised to find that out. Newson 
(2008) gives a counterexample from comparing 2 exponential 
distributions, and Newson (2009a) discusses the asymptotic distribution 
theory.)
In reply to your query about controlling for confounders, grouping by 
propensity score is a way to do this. An example of this method is 
presented in Newson (2006). The model used to generate the propensity 
score may either be a simple logistic model (with clustered Huber 
variances) or a random-effects logistic model. The random-effects 
logistic model might be marginally better, if its underlying assumptions 
are true. However, I would argue that a propensity score from an 
imperfect model may still be a good propensity score, in the sense of 
doing the job of modelling out the confounders. For some thoughts about 
choosing the right numbers of groups, see Newson (2009b).
I hope this helps. Let me know if you have any further queries.
Best wishes
Roger
References
Newson R. 2006. On the central role of Somers' D. Presented at the 12th 
UK Stata User Meeting, 11-12 September, 2006. Download from
http://ideas.repec.org/s/boc/usug06.html
Newson RB. 2008. Hodges-Lehmann median differences between exponential 
subpopulations. Download from
http://www.imperial.ac.uk/nhli/r.newson/papers.htm#miscellaneous_documents
Newson RB. 2009a. Asymptotic distributions of two-sample rank statistics 
for continuous outcomes. Download from
http://www.imperial.ac.uk/nhli/r.newson/papers.htm#miscellaneous_documents
Newson RB. 2009b. Homoskedastic adjustment inflation factors in model 
selection. Presented at the 15th UK Stata User Meeting, 10-11 September, 
2009.  Download from
http://ideas.repec.org/s/boc/usug09.html
Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
Opinions expressed are those of the author, not of the institution.
On 15/11/2012 21:41, Vasyl Druchkiv wrote:
Hello Roger,
thank you for your advice! I have applied -censlope- with -cluster- option to estimate differences between medians. It works great! To your question about the fraction of values equal to median. To answer this question let me return to the example, where I try to find the relationship between the ocular side and the difference between the central and thinnest points of cornea. The dependent variable is here CCT-TPCT. First I would like to describe the distribution of cct-tpct conditional on ocular side. In right eye the median is equal to 8 and 800 eyes from 8436 (9.5%) have the value of 8. In left eye the median is 7 and 756 eyes from 8436 ( 9%) have the value 7. For the right and left eyes -bpmedian- estimates missing standard errors and CIs. However, I don't really know how high must be fraction of values being equal to median to cause the zero standard errors.  On the contrary -parmset- displays "correctly" the se of 0 and 95% CIs of 8.  The same is for the left eye but
 with v
alue 7. The difference is trivial. However it is necessary to report the statistical significance.
Since eyes are not statistically independent (as you correctly pointed out) I used -censlope- with option -cluster- to estimate the differences between the eyes. The Somers' D as expected is negative (and with negative CIs) showing that the propensity to have a lower difference for the left eye is higher than that for the right eye (p<0.001). The difference between right and left eyes is -1 with CI -1 and 0 for 50th percentile; -4 (-5;-4) for 25th percentile; and 3 (3:3) for 75th percentile. Can it be said that the eyes are different in median? Or is it better to say that the propensity is significant? I ask because the upper CI for 50th percentile includes 0.
At this point I want thank you for your advice and for your STATA packages. It is really very helpful!
However, I have another question. Is it possible to control for confounder in  -censlope-? I consulted your paper* where you reported differences adjusted for confounder. However, I am not sure, how I should proceed if I have clustered data. As I understood I have to estimate propensity score defined as the predicted odds from the logistic regression where eye is a dependent variable. Though, I don't know whether this model should  be a simple logistic model or rather a logistic model with ID as a random effect?
Sorry for long explanations and for your time!
Regards,
Vasyl
*Newson, R. (2006): Confidence intervals for rank statistics: Percentile slopes, differences, and ratios. In: Stata Journal 6 (4), S. 497-520(24). Online verfügbar unter http://www.stata-journal.com/article.html?article=snp15_7.
-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]] Im Auftrag von Roger B. Newson
Gesendet: Wednesday, November 14, 2012 1:11 PM
An: [email protected]
Betreff: Re: AW: st: Missing confidence intervals for median after using -bootstrap- or -bpmedian-
Another point has occurred to me. You seem to be comparing left and right eyes in the same subjects (correct me if I'm wrong). So, if I am right, then your methods should be clustered by subject, because left and right eyes in the same subjects are not statistically independent.
And -qreg- doesn't seem to have options for clustering yet.
If you want to estimate a median difference between left and right eyes in the same group of subjects, then it might be a good idea to use the
-censlope- module of the -somersd- package, which you can also download from SSC, with a -cluster()- option.
I hope this helps. Let me know if you have any queries, especially about -censlope-.
Best wishes
Roger
Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group National Heart and Lung Institute Imperial College London Royal Brompton Campus Room 33, Emmanuel Kaye Building 1B Manresa Road London SW3 6LR UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/popgenetics/reph/
Opinions expressed are those of the author, not of the institution.
On 13/11/2012 22:55, Vasyl Druchkiv wrote:
Hello Nick and Roger,
thank you for your quick reply! Sorry, that I haven't provided background to the data. The variable from the example contains astigmatism of  the eyes that describes cornea steepness. This variable is not symmetric. In fact it is extremely skewed to the left. To get an idea of the data here are some descriptive statistics:
                            astigmatism
-------------------------------------------------------------
        Percentiles      Smallest
   1%         -4.5             -7
   5%           -3           -6.5
10%         -2.3           -6.5       Obs               16872
25%         -1.3           -6.5       Sum of Wgt.       16872
50%          -.8                      Mean          -1.005293
                          Largest       Std. Dev.      .9449402
75%          -.3              0
90%            0              0       Variance       .8929119
95%            0              0       Skewness      -1.812464
99%            0              0       Kurtosis       7.150496
So, you can see that the variable is not a constant one: there is a variation, although 54% of the eyes had an astigmatism  of -.8. I've applied already -parmest-  (-bpmedian- and -parmest- I downloaded from SSC) as suggested by Roger and indeed got the confidence intervals that are equal to median.
However it is not only the confidence intervals that concern me. In another case I try to run a quantile regression with bootstrap estimation method and  the difference between  thinnest and central points of the cornea as dependent variable. The dependent variable is also not symmetric and has positive skewness:
             cct-tpct
-------------------------------------------------------------
        Percentiles      Smallest
   1%            3              0
   5%            4              0
10%            4              1       Obs               16872
25%            5              1       Sum of Wgt.       16872
50%            8                      Mean           9.485479
                          Largest       Std. Dev.      8.423524
75%           11            122
90%           16            124       Variance       70.95575
95%           20            380       Skewness       14.34001
99%           33            380       Kurtosis       487.1662
When I use for instance ocular side (right/left) as a dummy independent variable I get:
Median regression, bootstrap(20) SEs                 Number of obs =     16872
    Raw sum of deviations    68401 (about 8)
    Min sum of deviations    68061                     Pseudo R2     =    0.0050
------------------------------------------------------------------------------
       ccttpct |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+--------------------------------------------------------
-------------+--------
           eye |         -1          .        .       .            .           .
         _cons |          8          .        .       .            .           .
----------------------------------------------------------------------
-------- So, there is a difference between eyes. However there are no
statistics to report. Of course I could use for example Wilcoxon  signed-rank test to check the differences (and would probably find insignificant results). But my idea is to fit a multivariate model with more independent variables.
If you could help me further it would be great.
Thank you in advance and sorry, if I was unclear about some points.
Best regards,
Vasyl
-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Roger B.
Newson
Gesendet: Tuesday, November 13, 2012 12:50 PM
An: [email protected]
Betreff: Re: st: Missing confidence intervals for median after using
-bootstrap- or -bpmedian-
The problem here seems to me to be a zero standard error for the median, caused by a zero variance for the median, caused by a constant variable.
For some reason, Stata is displaying the confidence interval as if the standard error was missing. This may possibly have something to do with version control (-bpmedian- is a Stata Version 10 command).
For what it's worth, the -parmest- package (also downloadable from
SSC) displays the confidence intervals for a Bonett-Price median of a
constant variable "correctly", with a zero standard error and upper
and lower confidence linits equal to the median. After -bpmedian-, the
user may type
parmest, list(,)
and display the "correct" confidence interval. You might also like to try using the -sccendif- module of the -scsomersd- package, which can also be downloaded from SSC, and which also calculates confidence intervals for medians, allowing the possibility of clustering and/or sampling-probability weights.
I hope this helps.
Best wishes
Roger
Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group National Heart and
Lung Institute Imperial College London Royal Brompton Campus Room 33,
Emmanuel Kaye Building 1B Manresa Road London SW3 6LR UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected]
Web page: http://www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/p
opgenetics/reph/
Opinions expressed are those of the author, not of the institution.
On 13/11/2012 00:49, Nick Cox wrote:
I am not a statistician; in fact many, perhaps most, people on this
list wouldn't call themselves statisticians.
You are asked to make clear where user-written programs you refer to
come from. -bpmedian- is from SSC or Roger Newson's website.
You don't tell us anything much about your data, either what it is
(the name "var" is not revealing) or any descriptive statistics. But
I see you have a large sample size. It seems likely therefore that
the confidence interval for anything will be narrow at worst.
However, it seems likely also from your results that you have lots of
ties. If so, the unusual result of a confidence interval of length 0
is likely to be an artefact of coarseness in data recording.  If so,
then reporting a confidence interval isn't really possible, as it
should be more like
.8 +/- smidgen where smidgen is less than the resolution of
measurement. By resolution, I mean the minimum difference between
reported measurements. If possible data are values like .7, .8, .9
the resolution is 0.1.
Conversely, if I were reviewing or examining this research, I would
want a report on the fraction of values that were recorded as .8. In
fact I would want a graph of the data. Of course, you may intend to
do all that.
Nick
On Mon, Nov 12, 2012 at 9:32 PM, Vasyl Druchkiv <[email protected]> wrote:
Dear statisticians,
I try to estimate CI's for the median with -bpmedian- or with
-bootstrap- using
*--------------------- begin example ------------------ centile var
bootstrap median=r(p50): sum var, detail
*--------------------- end example --------------------
The problem is that I get empty cells on standard error and
confidence intervals either by implementing -bpmediam- or -bootstrap-.
*--------------------- begin example ------------------ Bonett-Price
confidence interval for median of: var Number of observations: 16872
--------------------------------------------------------------------
-
-------
--
          var |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+------------------------------------------------------
-------------+-
-------------+-------
--
          _cons |        -.8          .        .       .            .
.
*--------------------- end example ------------------
I looked for the calculation method used in -bpmedian- . This method
is described in:
       Bonett, D. G. and Price, R. M.  2002.  Statistical inference
for a linear function of medians:  Confidence
       intervals, hypothesis testing, and sample size requirements.
Psychological Methods 7(3): 370-383.
Furthermore, I tried  to estimate CI's with SPSS using bootstrap and
got
(-0.8;-0.8) for 95% CI's. It means that the problem occurs when both
limits coincide with the median. However, the method described in
Bonnett-Price uses the formula:
sum(cjηj)±Za/2(sum(cj2varηj))^1/2  (pp: 372) So, even if the last
term is equal to 0 due to the pointy distribution  (var ηj=0), lower
and upper limits must be displayed in stata output and be equal to
-0.8 in my example. Can I just assume that CI's are  equal to median?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/