Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st:Confidence interval of difference between two proportions and-csi-

From   Garry Anderson <[email protected]>
To   [email protected]
Subject   Re: st:Confidence interval of difference between two proportions and-csi-
Date   Fri, 19 Mar 2004 19:11:05 +1100


Thank you Roger for your suggestion of using -exactcci-
However, this does not calculate a risk difference and the 95% CI.

I suppose I am suggesting to the folks at Stata that a revised formula be used in -csi- to calculate the 95% CI of the risk difference when there is a small number of observations, and one or both have a 100% risk. Currently it is possible for the upper bound to go beyond the theoretical maximum of 100% difference.

I would appreciate opinions on this by others on the list.

Newcombe RG (1998) Interval estimation for the difference between independent proportions: comparison of eleven methods. Statistics in Medicine 17: 873-890

Kind regards, Garry

At 05:17 PM 18/03/2004 +0000, you wrote:

At 16:37 18/03/04 +1100, Garry Anderson wrote:

I am enquiring if a more appropriate method could be used please to calculate the 95% CI of the difference between two proportions in the -csi- command?

At the moment it is possible for the upper bound of the confidence interval of the difference between two proportions to be greater than 1.0. I realize that the approximation that is used is not appropriate for small sample sizes, however I think that reporting of results that are impossible should be avoided.
One possibility is to use the -somersd- package, downloadable (complete with a .pdf manual) from SSC, using its -transform()- option. The difference between 2 proportions is a special case of Somers' D, and the -somersd- package offers a choice of transformations appropriate for Somers' D, notably the hyperbolic arctangent (or z) transformation or the arcsine transformation. If -diseased- and -exposed- are 2 binary (0,1) variables indicating disease and exposure, respectively, then Garry might type

somersd exposed diseased, tr(z)

or, alternatively,

somersd exposed diseased, tr(asin)

and get a confidence interval for the difference between the proportion of exposed individuals with the disease and the proportion of unexposed individuals with the disease, using a normalizing and variance-stabilizing transformation.

However, it should be stressed that, with Garry's example, there is a zero cell (for exposed noncases), so one of the proportions is either zero or one, so a normalizing or variance-stabilizing transformation might be inappropriate because the sample size is so low. In such circumstances, it might be better to use the -exactcci- package to define a conservative confidence interval for the odds ratio, which may have an infinite upper limit or a zero lower limit. If Garry uses -findit- to find and install the -exactcci- fackage and types

exactcci 5 1 0 4, exact

then the so-called "exact" confidence interval is generated. (Note, however, that this confidence interval is conservative, not exact. It is called "exact" because it uses the exact hypergeometric distribution to calculate conservative confidence limits.)

I hope this helps.


Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
King's College London
5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom

Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
or 020 7848 6605 International +44 20 7848 6605
Email: [email protected]

Opinions expressed are those of the author, not the institution.

* For searches and help try:
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index