Home  /  Resources & support  /  FAQs  /  Stata 6: Continuity adjustments
Note: This FAQ is for users of Stata 6. It is not relevant for more recent versions.

Stata 6: Why do Stata’s cc and cci commands report different confidence intervals than Epi Info?

Title   Stata 6: Continuity adjustments
Author William Gould, StataCorp

The short answer is that we do not apply the continuity adjustment, but Epi Info does. The rest of the FAQ details why we believe our answer is to be slightly preferred except when N is very small, in which case neither result is to be trusted.

A particular problem

A user sent the following 2x2 table to us:

             |   Exposed   Unexposed  
    ---------+------------------------
       Cases |        11           3  
    Controls |       106         223  

The user reported that Stata and Epi Info differed in their reported 95% confidence intervals even though both packages claimed to be using the Cornfield approximation. The reported confidence intervals are

    Epi Info              [1.94, 35.63]
    Stata                 [2.26, 26.20]

The Stata result can be obtained by typing cci 11 3 106 223.

The reason for the discrepancy

We have independently verified that Stata results are the results intended; see Appendix below.

We have independently verified that the Epi Info results are the results they intended; see Appendix below.

The difference in reported results is not due to programming errors. Rather, the difference hinges on whether one makes a continuity correction to the Cornfield iterative formula.

The Cornfield formula presented in Schlesselman (1982, 177) includes the continuity correction. Our two justifications for not including the continuity correction are

  1. The continuity correction is only justified statistically when you have an exact formula (exact at finite N) for the variance. In this case we only have asymptotic formulas for the variance.
  2. For skewed distributions such as this, the continuity correction often does more harm than good when N is above a small number.

If you really care about the confidence interval when dealing with small N, you should be using exact methods such as those available in the StatXact software package.

Comparison with logistic regression

Logistic regression provides another way one can obtain estimates of the odds ratio and the standard error. The estimated odds ratio will be the same as reported by Stata’s cci command (and by Epi Info). The standard error and derived confidence interval will be different from those reported by cci because different formulas are used.

In any case, we obtained the following results:

    Epi Info              [1.94, 35.63]
    Stata                 [2.26, 26.20]
    logistic regression   [2.11, 28.23]

Below we obtain the logistic regression results:

. list
 
          dead      expos        pop  
  1.         1          1         11  
  2.         1          0        106  
  3.         0          1          3  
  4.         0          0        223

.  logistic dead exp [fw=pop]
 
 Logit Estimates                                         Number of obs =    343
                                                         LR chi2(1)    =  12.15
                                                         Prob > chi2   = 0.0005
 Log Likelihood = -214.05327                             Pseudo R2     = 0.0276
 
dead Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
expos 7.713836 5.105902 3.087 0.002 2.107888 28.22885

Simulation results

As a quick way of determining the reliability of the Cornfield approximation without the continuity correction, we ran a simulation, under the null hypothesis (odds ratio==1), for a table with the same marginals as in the example above. In 1,000 replications, the results were

. summarize

Variable Obs Mean Std. Dev. Min Max
accept 1000 .96 .1960572 0 1

That is to say, the C.I. reported by Stata that was calculated without the continuity correction resulted in nonrejection of the null hypothesis in 960 out of 1,000 cases. Thus widening the confidence interval — as the continuity correction would — does not seem called for.

The following Stata do-file will reproduce the simulation results reported above and allow you to run your own:

  ------------------------------------------ BEGIN --- mysim.do --- CUT HERE ---
  version 6.0
  program drop _all
      
  program define mkdta 
      set obs 343
      gen exposed = _n<=14
  end
      
  program define asim
      gen u = uniform() 
      sort u
      gen case = _n<=117
      cc case exposed 
      post mm ($S_10<=1 & $S_11>=1)
      drop u case
  end
      
  program define sim
      drop _all
      mkdta
      postfile mm accept using myres, replace
      local i 1
      qui while `i' <= `1' {
      asim
      local i = `i' + 1
      }
      postclose mm
      use myres, clear
  end
      
  set seed 39483
  sim 1000
  sum
  -------------------------------------------- END --- mysim.do --- CUT HERE ---

Appendix: Independent reproduction of reported results

The purpose of this appendix is to establish that Stata is using the Cornfield approximation without the continuity correction and that Epi Info is using the same formula with the continuity correction.

Let us use the following notation:

             |   Exposed   Unexposed |
    ---------+-----------------------+---
       Cases |        a            b | M1
    Controls |        c            d | M0
    ---------+-----------------------+---
             |       N1           N2 |  T

The Cornfield confidence interval is

    ol = al(M0 - N1 + al)/((N1-al)(M1-al))
    ou = au(M0 - N1 + au)/((N1-au)(M1-au))

where al and au are obtained from

    a[i+1] = a +/-
           z*1/sqrt( 1/a[i] + 1/(N1-a[i]) + 1/(M1-a[i]) + 1/(M0-N1+a[i]) )

At least, that is the formula Stata uses. Epi Info uses

    a[i+1] = a +/- .5 +/-
           z*1/sqrt( 1/a[i] + 1/(N1-a[i]) + 1/(M1-a[i]) + 1/(M0-N1+a[i]) )

That is, Epi Info includes the continuity correction whereas Stata does not.

The following program will reproduce the Stata results:

 program define upper /* a0 */
     local a = 11
     local b = 106
     local c = 3 
     local d = 223

     local M1 = `a' + `b'
     local M0 = `c' + `d'

     local N1 = `a' + `c'
     local N0 = `b' + `d'
     
     local T = `M1' + `M0'
     
     local z = 1.96
     
     local ai = `1'
     while (1) { 
             di `ai' " " `ou'
             local ai = `a' + `z'*1/sqrt( /*
                     */ 1/`ai' + /*
                     */ 1/(`N1'-`ai') + /*
                     */ 1/(`M1'-`ai') + /*
                     */ 1/(`M0'-`N1'+`ai') /*
             */ )
             local ou = `ai'*(`M0'-`N1'+`ai') / /* 
                     */ ((`N1'-`ai')*(`M1'-`ai'))
     }
 end

The result of running this program is

. upper 3
3 
13.962681 820.50662
11.37803 9.1775262
13.819558 167.61792
11.826157 11.577601
13.62256 78.771227
12.184741 14.356851
13.436792 51.933344
12.435577 17.061584
13.288298 40.55852
[output omitted]
12.932766 26.192115
12.932766 26.192115
12.932766 26.192115
--Break--
r(1);

The slight difference from the result reported by Stata is due to our use of the (imprecise) 1.96.

We then modified the program to add ½ to `ai'. This resulted in nonconvergence. However, if we first converged the noncontinuity corrected formula and then used the continuity corrected formula, the formula would converge to 35.635.