# st: RE: RE: Does my statistic for "net proportion of subjects with improved prediction" already exist?

 From "Daniel Waxman" To Subject st: RE: RE: Does my statistic for "net proportion of subjects with improved prediction" already exist? Date Fri, 28 Sep 2007 15:06:36 -0400

```Nick, thank you.

Dan Waxman

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Friday, September 28, 2007 12:17 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: Does my statistic for "net proportion of subjects with

I wouldn't worry about reinventing the wheel. The wheel
was a very good idea. My own discipline, whatever it is,
is plagued with papers that propose square or other
non-circular wheels and then bask in the supposed
originality or creativity of the proposals.

I can't comment on whatever is your field. But
any measure that is a difference of probabilities
(proportions), as yours is, has clear anchor points
at +1, 0 and -1 and is attractive on that and other
grounds.

Other examples include some flavours of
rank correlations and related measures (including
the always popular Somers' d) and, on a more pedestrian
level, those discussed at

http://www.stata.com/support/faqs/data/measures.html

On a different level, it is neither necessary nor
efficient to create a new variable just to hold
a total, as in

. egen number_up_outcome=total(p_new>p_old & outcome)
. egen number_down_outcome=total(p_new<p_old & outcome)

. egen number_up_no_outcome=total(p_new>p_old & !outcome)
. egen number_down_no_outcome=total(p_new<p_old & !outcome)

Using -count- consistently will save you from a bundle
of such variables, all holding constants. For bullet-proof
code, you would need to trap missings (which count
as arbitrarily large) and ensure that the two estimation
samples here were identical. Note that missing values
of -outcome- would count as true.

logistic outcome zlog zero
gen byte used = e(sample)
predict p_old

logistic outcome zlog zero new_marker
// script will fail if two samples differ
assert used == e(sample)
predict p_new

count if used
local N = r(N)

count if p_new > p_old & used & outcome == 1
local a = r(N)
count if p_new < p_old & used & outcome == 1
local b = r(N)
...

Nick
n.j.cox@durham.ac.uk

Daniel Waxman

> I am studying the effect of adding a biomarker to an existing
> model and want
> to describe the effect of that model vis-à-vis the number of
> subjects with
> improved predictions in the "new model" vs. the "old model".
> While there is
> an extensive literature on this topic, most of it divides the
> outcome into
> risk categories (i.e. predicted risk of 0-5%, 5-10%, etc.),
> something that I
> am not so interested in doing.
>
> An intuitive way to look at this would be to look at the net number of
> subjects who are assigned a higher predicted probability with
> the new model
> among those with the outcome in question, plus the net number
> assigned a
> lower probability among those who did not have the outcome.
> The ratio of
> this number to the total # of subjects would then be the proportion of
> patients with improved predictions (and would range from zero
> to 1).  See
> example below.
>
> My question:  Did I just reinvent the wheel?  (e.g. is this
> equivalent to
> some existing statistic?)  Does anybody see any logical
> problem with looking
> at this as one measure of the effect of adding a predictor to
> an existing
> model?
>
> Thanks,
> Daniel Waxman
>
> **** example: (where zlog is continuous, zero is dichotomous,
> new_marker is
> the dichotomous new marker, and there is no missing data) ***
>
>
> . logistic outcome zlog zero
> . predict p_old
>
> . logistic outcome zlog zero new_marker
> . predict p_new
>
> . count if e(sample)
> . gen N=r(N)
>
> . egen number_up_outcome=total(p_new>p_old & outcome)
> . egen number_down_outcome=total(p_new<p_old & outcome)
>
> . egen number_up_no_outcome=total(p_new>p_old & !outcome)
> . egen number_down_no_outcome=total(p_new<p_old & !outcome)
>
> . gen net_proportion_improved=
> ((number_up_outcome-number_down_outcome)+(number_down_no_outco
> me-number_up_n
> o_outcome))/N

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.488 / Virus Database: 269.13.33/1034 - Release Date: 9/27/2007
5:00 PM

No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.488 / Virus Database: 269.13.33/1034 - Release Date: 9/27/2007
5:00 PM

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```