Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: why don't confidence intervals from -proportion- use the same formula as -ci-?

 From Austin Nichols To statalist@hsphsun2.harvard.edu Subject Re: st: why don't confidence intervals from -proportion- use the same formula as -ci-? Date Fri, 11 Jan 2013 12:38:31 -0500

```Ronan Conroy <rconroy@rcsi.ie> :
This has been discussed many times over the years on Statalist, with
the usual advice being: don't do that.  If you want CIs on
proportions, or to test differences in proportions, you probably want
to use -svy:tab- (and if you don't have a survey start with -svyset,
http://www.stata.com/statalist/archive/2010-05/msg00569.html
for the case when even -svy- commands fail to appropriately constrain
proportions to [0,1].

On Fri, Jan 11, 2013 at 6:44 AM, Ronan Conroy <rconroy@rcsi.ie> wrote:
> I have a real problem with the confidence intervals produced by the -proportion- command.
>
> . input outcome freq
>
>        outcome       freq
>   1. 0 21
>   2. 1 2
>   3. end
>
>
> Here is the confidence interval which is most probably closest the the nominal coverage according to
> - Brown L, Cai T, DasGupta A. Interval estimation for a binomial proportion. Statistical Science. 2001;16(2):101–17.
>
> . ci outcome [fw=freq], bin wil
>
>                                                          ------ Wilson ------
>     Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
> -------------+---------------------------------------------------------------
>      outcome |         23    .0869565    .0587534          .02418    .2679598
>
>
>
> Now here is what -proportion- does.
>
>
> . proportion outcome [fw=freq]
>
> Proportion estimation               Number of obs    =      23
>
> --------------------------------------------------------------
>              | Proportion   Std. Err.     [95% Conf. Interval]
> -------------+------------------------------------------------
> outcome      |
>            0 |   .9130435   .0600739      .7884579    1.037629
>            1 |   .0869565   .0600739      -.037629    .2115421
> --------------------------------------------------------------
>
> .
> end of do-file
>
> According to the manual:
>
>
> "Methods and formulas
> proportion is implemented as an ado-file.
> Proportions are means of indicator variables; see [R] mean."
>
> Is anyone prepared to defend this approach as the only formula implemented by -proportion-? Or indeed to tell me that they have managed to publish a paper that included confidence intervals such as the one above?
>
>
> I myself find this bizarre. Consider the example above. The confidence interval includes a value that is impossible - zero. With two observed successes, the success rate cannot be zero. And it includes probabilities that have no definition: negative probabilities. While I am prepared to accept that physicists have now produced temperatures that are lower than absolute zero, I cannot bring myself to persuade anyone that a confidence interval for a probability can extend beyond the interval 0-1.
>
>
> I believe it would be good if Stata's -proportion- command allowed the choice of some more believable methods.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```