[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: Does my statistic for "net proportion of subjects with improved prediction" already exist?

From	"Daniel Waxman" <[email protected]>
To	<[email protected]>
Subject	st: RE: RE: Does my statistic for "net proportion of subjects with improved prediction" already exist?
Date	Fri, 28 Sep 2007 15:06:36 -0400

Nick, thank you.  
Your input is very helpful.

Dan Waxman


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Friday, September 28, 2007 12:17 PM
To: [email protected]
Subject: st: RE: Does my statistic for "net proportion of subjects with
improved prediction" already exist?

I wouldn't worry about reinventing the wheel. The wheel
was a very good idea. My own discipline, whatever it is, 
is plagued with papers that propose square or other 
non-circular wheels and then bask in the supposed 
originality or creativity of the proposals. 

I can't comment on whatever is your field. But 
any measure that is a difference of probabilities
(proportions), as yours is, has clear anchor points 
at +1, 0 and -1 and is attractive on that and other
grounds. 

Other examples include some flavours of 
rank correlations and related measures (including 
the always popular Somers' d) and, on a more pedestrian 
level, those discussed at 

http://www.stata.com/support/faqs/data/measures.html

On a different level, it is neither necessary nor 
efficient to create a new variable just to hold 
a total, as in 

. egen number_up_outcome=total(p_new>p_old & outcome)
. egen number_down_outcome=total(p_new<p_old & outcome)
 
. egen number_up_no_outcome=total(p_new>p_old & !outcome)
. egen number_down_no_outcome=total(p_new<p_old & !outcome)

Using -count- consistently will save you from a bundle 
of such variables, all holding constants. For bullet-proof
code, you would need to trap missings (which count 
as arbitrarily large) and ensure that the two estimation 
samples here were identical. Note that missing values
of -outcome- would count as true. 

logistic outcome zlog zero 
gen byte used = e(sample) 
predict p_old

logistic outcome zlog zero new_marker 
// script will fail if two samples differ 
assert used == e(sample) 
predict p_new

count if used 
local N = r(N) 

count if p_new > p_old & used & outcome == 1 
local a = r(N) 
count if p_new < p_old & used & outcome == 1 
local b = r(N) 
... 


Nick 
[email protected] 

Daniel Waxman
 
> I am studying the effect of adding a biomarker to an existing 
> model and want
> to describe the effect of that model vis-�-vis the number of 
> subjects with
> improved predictions in the "new model" vs. the "old model".  
> While there is
> an extensive literature on this topic, most of it divides the 
> outcome into
> risk categories (i.e. predicted risk of 0-5%, 5-10%, etc.), 
> something that I
> am not so interested in doing.
> 
> An intuitive way to look at this would be to look at the net number of
> subjects who are assigned a higher predicted probability with 
> the new model
> among those with the outcome in question, plus the net number 
> assigned a
> lower probability among those who did not have the outcome.  
> The ratio of
> this number to the total # of subjects would then be the proportion of
> patients with improved predictions (and would range from zero 
> to 1).  See
> example below.
> 
> My question:  Did I just reinvent the wheel?  (e.g. is this 
> equivalent to
> some existing statistic?)  Does anybody see any logical 
> problem with looking
> at this as one measure of the effect of adding a predictor to 
> an existing
> model?
> 
> Thanks,
> Daniel Waxman
> 
> **** example: (where zlog is continuous, zero is dichotomous, 
> new_marker is
> the dichotomous new marker, and there is no missing data) ***
> 
> 
> . logistic outcome zlog zero 
> . predict p_old
> 
> . logistic outcome zlog zero new_marker 
> . predict p_new
> 
> . count if e(sample)
> . gen N=r(N)
> 
> . egen number_up_outcome=total(p_new>p_old & outcome)
> . egen number_down_outcome=total(p_new<p_old & outcome)
> 
> . egen number_up_no_outcome=total(p_new>p_old & !outcome)
> . egen number_down_no_outcome=total(p_new<p_old & !outcome)
> 
> . gen net_proportion_improved=
> ((number_up_outcome-number_down_outcome)+(number_down_no_outco
> me-number_up_n
> o_outcome))/N 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

No virus found in this incoming message.
Checked by AVG Free Edition. 
Version: 7.5.488 / Virus Database: 269.13.33/1034 - Release Date: 9/27/2007
5:00 PM
 

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.488 / Virus Database: 269.13.33/1034 - Release Date: 9/27/2007
5:00 PM
 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Does my statistic for "net proportion of subjects with improved prediction" already exist?
  - From: "Daniel Waxman" <[email protected]>
- st: RE: Does my statistic for "net proportion of subjects with improved prediction" already exist?
  - From: "Nick Cox" <[email protected]>

Prev by Date: st: Update to -estout- available from SSC
Next by Date: RE: st: about vs. verinst
Previous by thread: st: RE: Does my statistic for "net proportion of subjects with improved prediction" already exist?
Next by thread: st: Please reply
Index(es):
- Date
- Thread