Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: bug in -rndbinx-?


From   Jeph Herrin <junk@spandrel.net>
To   statalist@hsphsun2.harvard.edu
Subject   st: bug in -rndbinx-?
Date   Tue, 22 Apr 2008 11:11:05 -0400

All,

I'm using -rndbinx- to generate synthetic datasets. However,
there seems to be a discontinuity for certain denominator values.

I have localized the problem to the following; I hope that someone
here can either determine my problem or agree there is one with
-rndbinx-:


==================================================================
. u temp, clear

. sum

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         out |     14990    .0365198    .0644039          0          1

. centile out, c(20)

                                                    -- Binom. Interp. --
    Variable |     Obs  Percentile      Centile    [95% Conf. Interval]
-------------+-------------------------------------------------------------
         out |   14990         20         .0105      .01       .0112

. local cut20=round(r(c_1),0.0001)

. bsample 10000

. gen denom=190

. rndbinx out denom
( Generating ................... )
Variable bnlx created.

. sum bnlx

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        bnlx |     10000       7.044    12.81637          0        190

. gen rate1=bnlx/denom

. drop bnlx

. replace denom=191
(10000 real changes made)

. rndbinx out denom
( Generating ................. )
Variable bnlx created.

. sum bnlx

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
        bnlx |     10000      7.0998    12.91846          0        191

. gen rate2=bnlx/denom

. drop bnlx

. sum

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         out |     10000    .0371277     .066508          0          1
       denom |     10000         191           0        191        191
       rate1 |     10000    .0370737    .0674546          0          1
       rate2 |     10000    .0371717    .0676359          0          1

. count if rate1<`cut20'
 2191

. count if rate2<`cut20'
 2945

========================================================================

The problem is apparent in the last two commands - the tail of the
distribution balloons when denominator is increased from 190 to 191.

There is nothing special about 190->191, the actual discontinuity seems
to vary with the underlying probability -out-. However, when I loop the
above over values of the denominator from say 1-500, the tail drops off
smoothly until the denominator hits a certain value (here, 191), when
it jumps up, then drops off smoothly again for another 100 values, then
jumps up again.

I'm trying to simulate the sensitivity & specificity of a classification
scheme as a function of denominator, but the result is a very unnatural
looking relationship.

Any thoughts?

thanks in advance,
Jeph


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index