Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Inconsistent results with rocfit

From	Ronan Conroy <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: Inconsistent results with rocfit
Date	Tue, 2 Mar 2010 11:09:59 +0000


On 25 Feabh 2010, at 18:30, Paul Seed wrote:

Dear Statalist,

An odd problem has come up.
I have two versions on the same predictor
(as measured & logged) , and one binary outcome.

When I use -roctab-, I get identical estimates of the ROC area.
when I use -rocfit-, I do not.

The problem is reproducible. Using a dataset I'm currently working on,and a similar setup to Paul's, with


. rocfit diagnosis logbnp1 , cont(5)

I get an ROC area of 0.738, very similar to the 0.724 obtained from -roctab-


However,

. rocfit diagnosis bnp1, cont(5)

gives an ROC area of 0.358! -roctab- reports the same area as before,0.724

It seems to me that the problem is that the -cut- option divides therange of the data into more or less equal lengths, rather than intoquantiles. The result is that where the variable is very skewed, thefrequencies are skewed. Here are the frequency distributions of thevariables generated by the -cut(5)- option:



-> tabulation of cut_bnp1

   cut_bnp1 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        109       83.85       83.85
          2 |         15       11.54       95.38
          3 |          3        2.31       97.69
          4 |          2        1.54       99.23
          5 |          1        0.77      100.00
------------+-----------------------------------
      Total |        130      100.00

-> tabulation of cut_logbnp1

cut_logbnp1 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         24       18.46       18.46
          2 |         46       35.38       53.85
          3 |         54       41.54       95.38
          4 |          6        4.62      100.00
------------+-----------------------------------
      Total |        130      100.00

As you can see, log_bnp ended up in four groups of which three hadadequate numbers, while bnp had almost no observations in three of thefive categories. This is what we used to call a misfeature - somethingthat works as described in the manual, but does something that may notbe in the user's best interests. I'd suggest the addition of a -group-option that allowed -continuous- to produce n more or less equal sizedgroups.

The more alert (or anyone still reading this) will also note that -cut(5)- produced five groups in the first instance and four in thesecond. This seems to me like a bug.


This email has been cc'd to tech support!

Ronan Conroy
=================================

[email protected]
Royal College of Surgeons in Ireland
Epidemiology Department,
Beaux Lane House, Dublin 2, Ireland
+353 (0)1 402 2431
+353 (0)87 799 97 95
+353 (0)1 402 2764 (Fax - remember them?)
http://rcsi.academia.edu/RonanConroy

P    Before printing, think about the environment




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Re: !start
Next by Date: st: AW: Tobit
Previous by thread: st: difference in odds ratio
Next by thread: Re: st: Inconsistent results with rocfit
Index(es):
- Date
- Thread