Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: statalist-digest V4 #4807 (st: reliability with -icc- ) - Statistics as APPLIED science


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: statalist-digest V4 #4807 (st: reliability with -icc- ) - Statistics as APPLIED science
Date   Thu, 28 Feb 2013 11:10:40 +0000

I do agree broadly with Allan, whether or not that is surprising.

A wilder idea is that rater 4 who gave no score higher than 3 either
never knew or somehow forgot that scores could be up to 100 and just
used a 5-point scale. Even if #4 did know that, #4 is so out-of-line
that including them remains dubious, although doing computations with
and without #4 remains manageable.

In any case if the highest score is 18, then something else is going
on that needs to be spelled out, if only as context.

Only the original poster can add more context than we already have. In
any field that I know about this dataset would be too small to be
publishable, except as a toy dataset to make points about method
(which I take it is Jay's motive here).

Nick

On Thu, Feb 28, 2013 at 10:42 AM, Allan Reese (Cefas)
<[email protected]> wrote:
> Lenny Lesser posed a problem (copied below) on Statalist and has
> received several replies to the question posed on ICC method.  Nick Cox
> commented "A scatter plot matrix is instructive" and in a follow-up
> message added, "it is well to know what patterns do or do not exist
> before you start quantifying them.  This is just to underline that
> repeating the graphs with ranks underlines how much information is
> thereby discarded."  There have been various comments on methods and
> Stata commands, including JVerkuilen's, "my gut impression is that you
> really should use the scores. The context you cite is no doubt correct,
> but for comparing raters with each other the scores they gave are
> essential."
>
> With no disrespect to anyone, I see this as a classic example of
> "Mathsworld" - a mindset where because you are presented with numbers
> you do sums, and the context is ignored.  LOOK AT THE DATA.
>
> In the first place, the score values are not skewed; they are just all
> low range.  The highest score is 18/100, adjustable to A* in GCSE but
> indicating to me a general problem with the exercise.  The "outlier"
> rater 4 has awarded values ONLY of 0/1/2/3, from which I'd guess she has
> ranked the apps anyway and never awarded scores.
>
> . table Score Rater
>
> ----------------------------------
>           |         Rater
>     Score |    1     2     3     4
> ----------+-----------------------
>         0 |                      1
>         1 |          1           8
>         2 |    1     1     1     1
>         3 |          2           1
>         5 |    1     1     1
>         6 |    2           3
>         7 |    2     1     1
>         8 |          1
>         9 |          1
>        10 |          1
>        11 |    1           1
>        12 |          2
>        13 |    1           1
>        15 |                1
>        16 |    1           1
>        17 |    1           1
>        18 |    1
> ----------------------------------
>
> Perhaps everyone noticed that, and has treated this as purely an
> exercise in modelling?  However, it is my impression that people
> sometimes love their models more than the data they "explain".
>
> Allan
>
>
> ------------------------------------------------------------------------
> --------
> From   Lenny Lesser <[email protected]>
> To   statalist <[email protected]>
> Subject   st: reliability with -icc- and -estat icc-
> Date   Mon, 25 Feb 2013 20:56:52 -0800
>
> ------------------------------------------------------------------------
> --------
>
> I have 4 raters that gave a score of 0-100 on 11 smartphone
> applications.
> The data is skewed right, as they all got low scores.  I'm using the
> ranks (within an individual) instead of the actual scores.  I want to
> know the correlation in ranking between the different raters.
>
> I've tried the two commands:
>
> -xtmixed rank Application || Rater: , reml
> -estat icc
>
> (icc=0.19)
>
> and
>
> -icc rank Rater Application, mixed consistency
>
> (icc=0.34)
>
> They give me two different answers. Which one is correct?
>
>
> Next, we found out that rater 4 was off the charts, and we want to
> eliminate her and rerun the analysis. When we do this we get wacky
> ICCs.  In the first method we get an ICC of 2e-26.  In the 2nd method
> (-icc), we get -.06.  Eliminating any of the other raters gives us
> ICCs close to the original ICC.  Why are we getting such a crazy
> number when we eliminate this 4th rater?
>
>
> I'm guessing this might be instability in the model, but I'm not sure
> how to get around it.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index