# RE: st: Detection of disease

 From "Seed, Paul" <[email protected]> To "[email protected]" <[email protected]> Subject RE: st: Detection of disease Date Fri, 15 Aug 2008 14:57:55 +0100

```Carlo George poses an interesting problem.
To deal with a few epidemiological issues first -
"...to be 95% certain that the population is free from disease,"
he must assume
- that the disease level is either 20% [Null hypothesis: H0] or 0% [Alternative hypothesis Ha], (a lower rate might well be missed.)
- that the sample is representative of the population (a local outbreak outside the sampling area would certainly be missed)
- that the test used is 100% sensitive

A more realistic goal might be "...to be 95% certain that the test-positive rate is less than 20% in the population represented by the sample."

He is interested in a onesided test at the 95% level, as probabilities < 0 have no meaning; so the standard Stata command is
sampsi .2 0 , onesample onesided

This gives n=11 (not 16), which is still different from the n=14 from the freeware package "Winepiscope" that Carlo uses.  The reason is that -sampsi- uses Normal approximations for percentages, which tend to give smaller values than exact tests.  To replicate Carlo's result, another approach is needed.  This is made much easier by the fact that the disease level is 0% under Ha, so no events are expected.

We can perform both tests in Stata; using -bitesti- for the exact test & -prtesti- for the Normal approximation (or Chi-sq test).

foreach n of numlist 10/15 {
bitesti `n' 0
prtesti `n' 0
}
Concentrating on the onesided p-values (Ha: p < 0.2), it is clear that 14 subjects is the smallest number to give a significant test by the exact test; and 11 by the Normal approximation.  The first figure confirms the Winepiscope result.

An added level of sophistication is to look at the confidence intervals.  Stata offers several:
Wald (a version of the Normal approximation), "exact" (Clopper-Pearson), Wilson, Agresti-Coull, Jeffreys.  90% CI are needed to give a one-sided 95% interval.  Both the Wald & Jeffreys intervals perform poorly in this case; but Wilson, "exact" and Agresti-Coull are worth considering.  In particular, the Wilson interval seems to fit with the results of -prtestri-, which may be of interest, as there are arguments that the "exact" test is in fact over-conservative (hence the quotation marks).  (I could dig out the references if anyone's interested.

cii 14 0 , exact level(90)

-- Binomial Exact --
Variable |        Obs        Mean    Std. Err.       [90% Conf. Interval]
-------------+---------------------------------------------------------------
|         14           0           0               0    .1926362*

(*) one-sided, 95% confidence interval

cii 14 0 , wald level(90)

-- Binomial Wald ---
Variable |        Obs        Mean    Std. Err.       [90% Conf. Interval]
-------------+---------------------------------------------------------------
|         14           0           0               0           0

cii 14 0 , wilson level(90)

------ Wilson ------
Variable |        Obs        Mean    Std. Err.       [90% Conf. Interval]
-------------+---------------------------------------------------------------
|         14           0           0               0    .1619548

cii 14 0 , agresti level(90)

-- Agresti-Coull ---
Variable |        Obs        Mean    Std. Err.       [90% Conf. Interval]
-------------+---------------------------------------------------------------
|         14           0           0               0    .1907622

The Agresti-Coull interval was clipped at the lower endpoint.

cii 14 0 , jeffreys level(90)

----- Jeffreys -----
Variable |        Obs        Mean    Std. Err.       [90% Conf. Interval]
-------------+---------------------------------------------------------------
|         14           0           0               0    .1260576

Date: Thu, 14 Aug 2008 11:53:33 +0200
From: "Carlo Georges" <[email protected]>
Subject: st: Detection of disease

I tried to reproduce in stata the calculation needed for the following case:

I need to determine the sample size, required to detct the presence of
disease in a population.

The formula is rather complex so it is difficult to paste in here,

For example i need to detect with 95% confidence the abscence of disease in
a population where the presumed prevalence would be 20%. How lrge a sample
size do I need to be 95% certain that the population is free from disease.

I used a program "Winepiscope" freeware, that calculated a samplesize of 14.

in stata i tried : sampsi 0.2 0, power(0.9) onesample

and I get a result of :16

Can stata handle this type of calculation?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```