Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Clyde Schechter" <clyde.schechter@einstein.yu.edu> |
To | statalist@hsphsun2.harvard.edu |
Subject | RE: RE: st: ROC/logistic regression questions |
Date | Tue, 1 Mar 2011 07:50:11 -0800 |
Junlin Liao inquired of me: "I'm wondering if you can explain the coefficients in -rocfit- models? How do I explain it in plain English?" The more familiar commands relating to ROC curves, -lroc-, -roctab-, and -roccomp- are non-parametric procedures. -rocfit- takes a different approach. The starting point is the same: there is an actual dichotomous outcome, call it success vs failure, and there is an ordinal observed variable which is being used to predict that outcome, call it predictor. -rocfit- fits a parametric model, called the binormal model, to the data. First, the predictor variable itself, if discrete, is assumed to arise through the application of cutpoints to an underlying continuous latent variable. (This is specified using the -, continuous()- option in -rocfit-). Second, the predictor variable or its underlying latent variable, is assumed to have a normal distribution among those cases with a success outcome, and a (usually different) normal distribution among the cases with a failure outcome. Each of those normal distributions is characterized in the usual way by a mean and a standard deviation. Let's call them mu_s sd_s, and mu_f sd_f. Digression: If the binormal model is actually true, and if sd_s = sd_f, then it can be shown with a fairly simple calculation that the usual logistic regression equation describes exactly the relationship between the continuous (observed or latent) predictor and the probability of a success outcome. Third, -rocfit- estimates the parameters of those normal distributions. But instead of providing them directly, it provides a different characterization of the distributions which is sometimes of greater interest. In particular, the slope in -rocfit-'s output estimates the ratio sd_f/sd_s. And the intercept estimates the standardized difference in means (mu_s-mu_f)/sd_s. Thus the slope characterizes the relative dispersion of the continuous predictor among the successes and failures, and the intercept is an effect size, strongly analogous to Cohen's d. That's about it. Frankly, I can count on the fingers of one hand the number of times I have used the binormal model approach to ROC curves in my work. In clinical work and general medical epidemiology it isn't very popular. I imagine that like other models that involve latent variables, one would find it more widely used in psychology and psychiatry, though I don't really know. Hope this helps. Clyde Schechter Department of Family & Social Medicine Albert Einstein College of Medicine Bronx, NY, USA Please note new e-mail address: clyde.schechter@einstein.yu.edu * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/