[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Daniel Waxman" <dan@amplecat.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: Does my statistic for "net proportion of subjects with improved prediction" already exist? |

Date |
Fri, 28 Sep 2007 15:06:36 -0400 |

Nick, thank you. Your input is very helpful. Dan Waxman -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Friday, September 28, 2007 12:17 PM To: statalist@hsphsun2.harvard.edu Subject: st: RE: Does my statistic for "net proportion of subjects with improved prediction" already exist? I wouldn't worry about reinventing the wheel. The wheel was a very good idea. My own discipline, whatever it is, is plagued with papers that propose square or other non-circular wheels and then bask in the supposed originality or creativity of the proposals. I can't comment on whatever is your field. But any measure that is a difference of probabilities (proportions), as yours is, has clear anchor points at +1, 0 and -1 and is attractive on that and other grounds. Other examples include some flavours of rank correlations and related measures (including the always popular Somers' d) and, on a more pedestrian level, those discussed at http://www.stata.com/support/faqs/data/measures.html On a different level, it is neither necessary nor efficient to create a new variable just to hold a total, as in . egen number_up_outcome=total(p_new>p_old & outcome) . egen number_down_outcome=total(p_new<p_old & outcome) . egen number_up_no_outcome=total(p_new>p_old & !outcome) . egen number_down_no_outcome=total(p_new<p_old & !outcome) Using -count- consistently will save you from a bundle of such variables, all holding constants. For bullet-proof code, you would need to trap missings (which count as arbitrarily large) and ensure that the two estimation samples here were identical. Note that missing values of -outcome- would count as true. logistic outcome zlog zero gen byte used = e(sample) predict p_old logistic outcome zlog zero new_marker // script will fail if two samples differ assert used == e(sample) predict p_new count if used local N = r(N) count if p_new > p_old & used & outcome == 1 local a = r(N) count if p_new < p_old & used & outcome == 1 local b = r(N) ... Nick n.j.cox@durham.ac.uk Daniel Waxman > I am studying the effect of adding a biomarker to an existing > model and want > to describe the effect of that model vis-à-vis the number of > subjects with > improved predictions in the "new model" vs. the "old model". > While there is > an extensive literature on this topic, most of it divides the > outcome into > risk categories (i.e. predicted risk of 0-5%, 5-10%, etc.), > something that I > am not so interested in doing. > > An intuitive way to look at this would be to look at the net number of > subjects who are assigned a higher predicted probability with > the new model > among those with the outcome in question, plus the net number > assigned a > lower probability among those who did not have the outcome. > The ratio of > this number to the total # of subjects would then be the proportion of > patients with improved predictions (and would range from zero > to 1). See > example below. > > My question: Did I just reinvent the wheel? (e.g. is this > equivalent to > some existing statistic?) Does anybody see any logical > problem with looking > at this as one measure of the effect of adding a predictor to > an existing > model? > > Thanks, > Daniel Waxman > > **** example: (where zlog is continuous, zero is dichotomous, > new_marker is > the dichotomous new marker, and there is no missing data) *** > > > . logistic outcome zlog zero > . predict p_old > > . logistic outcome zlog zero new_marker > . predict p_new > > . count if e(sample) > . gen N=r(N) > > . egen number_up_outcome=total(p_new>p_old & outcome) > . egen number_down_outcome=total(p_new<p_old & outcome) > > . egen number_up_no_outcome=total(p_new>p_old & !outcome) > . egen number_down_no_outcome=total(p_new<p_old & !outcome) > > . gen net_proportion_improved= > ((number_up_outcome-number_down_outcome)+(number_down_no_outco > me-number_up_n > o_outcome))/N * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.488 / Virus Database: 269.13.33/1034 - Release Date: 9/27/2007 5:00 PM No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.488 / Virus Database: 269.13.33/1034 - Release Date: 9/27/2007 5:00 PM * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Does my statistic for "net proportion of subjects with improved prediction" already exist?***From:*"Daniel Waxman" <dan@amplecat.com>

**st: RE: Does my statistic for "net proportion of subjects with improved prediction" already exist?***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: Update to -estout- available from SSC** - Next by Date:
**RE: st: about vs. verinst** - Previous by thread:
**st: RE: Does my statistic for "net proportion of subjects with improved prediction" already exist?** - Next by thread:
**st: Please reply** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |