|
Note: This FAQ is for users of Stata 6, an older version of Stata.
It is not relevant for more recent versions.
Stata 6: Why do Stata’s cc and cci commands report different confidence
intervals than Epi Info?
| Title |
|
Stata 6: Continuity adjustments |
| Author |
William Gould, StataCorp |
| Date |
January 1999 |
The short answer is that we do not apply the continuity adjustment, but
Epi Info does. The
rest of the FAQ details why we believe our answer is to be slightly
preferred except when N is very small, in which case neither result is to be
trusted.
A particular problem
A user sent the following 2x2 table to us:
| Exposed Unexposed
---------+------------------------
Cases | 11 3
Controls | 106 223
The user reported that Stata and Epi Info differed in their reported 95%
confidence intervals even though both packages claimed to be using the
Cornfield approximation. The reported confidence intervals are
Epi Info [1.94, 35.63]
Stata [2.26, 26.20]
The Stata result can be obtained by typing cci 11 3 106 223.
The reason for the discrepancy
We have independently verified that Stata results are the results intended;
see Appendix below.
We have independently verified that the Epi Info results are the results
they intended; see Appendix below.
The difference in reported results is not due to programming errors.
Rather, the difference hinges on whether one makes a continuity correction
to the Cornfield iterative formula.
The Cornfield formula presented in Schlesselman (1982, 177) includes the
continuity correction. Our two justifications for not including the
continuity correction are
-
The continuity correction is only justified statistically when you have an
exact formula (exact at finite N) for the variance.
In this case we only have asymptotic formulas for the variance.
-
For skewed distributions such as this, the continuity correction often does
more harm than good when N is above a small number.
If you really care about the confidence interval when dealing with small N,
you should be using exact methods such as those available in the
StatXact software package.
Comparison with logistic regression
Logistic regression provides another way one can obtain estimates of the
odds ratio and the standard error. The estimated odds ratio will be the
same as reported by Stata’s cci command (and by Epi Info). The
standard error and derived confidence interval will be different from those
reported by cci because different formulas are used.
In any case, we obtained the following results:
Epi Info [1.94, 35.63]
Stata [2.26, 26.20]
logistic regression [2.11, 28.23]
Below we obtain the logistic regression results:
. list
dead expos pop
1. 1 1 11
2. 1 0 106
3. 0 1 3
4. 0 0 223
. logistic dead exp [fw=pop]
Logit Estimates Number of obs = 343
LR chi2(1) = 12.15
Prob > chi2 = 0.0005
Log Likelihood = -214.05327 Pseudo R2 = 0.0276
------------------------------------------------------------------------------
dead | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
expos | 7.713836 5.105902 3.087 0.002 2.107888 28.22885
------------------------------------------------------------------------------
Simulation results
As a quick way of determining the reliability of the Cornfield approximation
without the continuity correction, we ran a simulation, under the null
hypothesis (odds ratio==1), for a table with the same marginals as in the
example above. In 1,000 replications, the results were
. summarize
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
accept | 1000 .96 .1960572 0 1
That is to say, the C.I. reported by Stata that was calculated without the
continuity correction resulted in nonrejection of the null hypothesis in 960
out of 1,000 cases. Thus widening the confidence interval — as the
continuity correction would — does not seem called for.
The following Stata do-file will reproduce the simulation results reported
above and allow you to run your own:
------------------------------------------ BEGIN --- mysim.do --- CUT HERE ---
version 6.0
program drop _all
program define mkdta
set obs 343
gen exposed = _n<=lt;=14
end
program define asim
gen u = uniform()
sort u
gen case = _n<=lt;=117
cc case exposed
post mm ($S_10<=lt;=1 & $S_11>=gt;=1)
drop u case
end
program define sim
drop _all
mkdta
postfile mm accept using myres, replace
local i 1
qui while `i' <=lt;= `1' {
asim
local i = `i' + 1
}
postclose mm
use myres, clear
end
set seed 39483
sim 1000
sum
-------------------------------------------- END --- mysim.do --- CUT HERE ---
Appendix: Independent reproduction of reported results
The purpose of this appendix is to establish that Stata is using the
Cornfield approximation without the continuity correction and that Epi Info
is using the same formula with the continuity correction.
Let us use the following notation:
| Exposed Unexposed |
---------+-----------------------+---
Cases | a b | M1
Controls | c d | M0
---------+-----------------------+---
| N1 N2 | T
The Cornfield confidence interval is
ol = al(M0 - N1 + al)/((N1-al)(M1-al))
ou = au(M0 - N1 + au)/((N1-au)(M1-au))
where al and au are obtained from
a[i+1] = a +/-
z*1/sqrt( 1/a[i] + 1/(N1-a[i]) + 1/(M1-a[i]) + 1/(M0-N1+a[i]) )
At least, that is the formula Stata uses. Epi Info uses
a[i+1] = a +/- .5 +/-
z*1/sqrt( 1/a[i] + 1/(N1-a[i]) + 1/(M1-a[i]) + 1/(M0-N1+a[i]) )
That is, Epi Info includes the continuity correction whereas Stata does not.
The following program will reproduce the Stata results:
program define upper /* a0 */
local a = 11
local b = 106
local c = 3
local d = 223
local M1 = `a' + `b'
local M0 = `c' + `d'
local N1 = `a' + `c'
local N0 = `b' + `d'
local T = `M1' + `M0'
local z = 1.96
local ai = `1'
while (1) {
di `ai' " " `ou'
local ai = `a' + `z'*1/sqrt( /*
*/ 1/`ai' + /*
*/ 1/(`N1'-`ai') + /*
*/ 1/(`M1'-`ai') + /*
*/ 1/(`M0'-`N1'+`ai') /*
*/ )
local ou = `ai'*(`M0'-`N1'+`ai') / /*
*/ ((`N1'-`ai')*(`M1'-`ai'))
}
end
The result of running this program is
. upper 3
3
13.962681 820.50662
11.37803 9.1775262
13.819558 167.61792
11.826157 11.577601
13.62256 78.771227
12.184741 14.356851
13.436792 51.933344
12.435577 17.061584
13.288298 40.55852
[output omitted]
12.932766 26.192115
12.932766 26.192115
12.932766 26.192115
--Break--
r(1);
The slight difference from the result reported by Stata is due to our use of
the (imprecise) 1.96.
We then modified the program to add ½ to `ai'. This resulted in
nonconvergence. However, if we first converged the noncontinuity corrected
formula and then used the continuity corrected formula, the formula would
converge to 35.635.
|