Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
jcoveney@bigplanet.com |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: reliability with -icc- and -estat icc- |

Date |
Wed, 27 Feb 2013 15:06:37 -0000 |

Lenny Lesser wrote: I have 4 raters that gave a score of 0-100 on 11 smartphone applications. The data is skewed right, as they all got low scores. I'm using the ranks (within an individual) instead of the actual scores. I want to know the correlation in ranking between the different raters. I've tried the two commands: -xtmixed rank Application || Rater: , reml -estat icc (icc=0.19) and -icc rank Rater Application, mixed consistency (icc=0.34) They give me two different answers. Which one is correct? Next, we found out that rater 4 was off the charts, and we want to eliminate her and rerun the analysis. When we do this we get wacky ICCs. In the first method we get an ICC of 2e-26. In the 2nd method (-icc), we get -.06. Eliminating any of the other raters gives us ICCs close to the original ICC. Why are we getting such a crazy number when we eliminate this 4th rater? I'm guessing this might be instability in the model, but I'm not sure how to get around it. ---------------------------------------------------------------------- -------- When in doubt, try going back to a reference source ( www.hongik.edu/~ym480/Shrout-Fleiss-ICC.pdf ) and manually computing the ICC. According to the source, ?ICC is the correlation between one measurement . . . on a target and another measurement obtained on that target.? In your case, targets are smartphone software. By the way, Rater #4 is providing valuable information about rater reliability, and so I recommend against eliminating her scores from the ICC computation. Just by inspection, raters are not reliable--if your sample is representative, then a quarter of the population of raters disagrees dramatically from the rest; even excluding this fraction, the ICC is less than 60%. Moreover, none of the raters? scores covers anywhere near the dynamic range you and your colleagues have allocated for the measurement. My take on all that would be that your volunteers need better training on evaluating smartphone software in the manner that you want it done. Perhaps you and your colleagues could provide more explicit instructions on what you?re are looking for in measuring the characteristic(s) of the software that you?re trying to measure. Joseph Coveney version 11.2 clear * set more off input byte(Application Rater Score rank) 5 1 2 1 7 1 5 2 2 1 6 3 9 1 6 3 11 1 7 4 6 1 7 4 8 1 11 5 3 1 13 6 4 1 16 7 10 1 17 8 1 1 18 9 6 2 1 1 5 2 2 2 11 2 3 3 7 2 3 3 4 2 5 4 1 2 7 5 8 2 8 6 2 2 9 7 3 2 10 8 10 2 12 9 9 2 12 9 5 3 2 1 2 3 5 2 7 3 6 3 6 3 6 3 9 3 6 3 11 3 7 4 8 3 11 5 3 3 13 6 4 3 15 7 10 3 16 8 1 3 17 9 7 4 0 1 1 4 1 2 9 4 1 2 6 4 1 2 8 4 1 2 4 4 1 2 5 4 1 2 3 4 1 2 11 4 1 2 2 4 2 3 10 4 3 4 end program define icc21 version 11.2 syntax varlist [if] quietly anova `varlist' `if' tempname BMS JMS EMS k n ICC scalar define `BMS' = e(ss_2) / e(df_2) scalar define `JMS' = e(ss_1) / e(df_1) scalar define `EMS' = e(rss) / e(df_r) scalar define `k' = e(df_1) + 1 scalar define `n' = e(df_2) + 1 scalar define `ICC' = (`BMS' - `EMS') / /// (`BMS' + (`k' - 1) * `EMS' + (`k' * (`JMS' - `EMS') / `n')) display in smcl as text "ICC Type 2, single rater" display in smcl as text "ICC(2, 1) = " `ICC' end program define iccem version 11.2 syntax tempname sigma2_judge sigma2_target sigma2_residual ICC scalar define `sigma2_target' = exp(_b[lns1_1_1:_cons])^2 scalar define `sigma2_judge' = exp(_b[lns1_2_1:_cons])^2 scalar define `sigma2_residual' = exp(_b[lnsig_e:_cons])^2 scalar define `ICC' = `sigma2_target' / /// (`sigma2_target' + `sigma2_judge' + `sigma2_residual') display in smcl as text "ICC Type 2, single rater" display in smcl as text "ICC(2, 1) = " `ICC' end * * Within-rater rank-transformed scores * xtmixed rank || _all:R.Application || _all:R.Rater, /// reml nolrtest nostderr variance nolog iccem icc21 rank Rater Application xtmixed rank if Rater != 4 || _all:R.Application || _all:R.Rater, /// reml nolrtest nostderr variance nolog iccem icc21 rank Rater Application if Rater != 4 * * Original scores * xtmixed Score || _all:R.Application || _all:R.Rater, /// reml nolrtest nostderr variance nolog iccem icc21 Score Rater Application xtmixed Score if Rater != 4 || _all:R.Application || _all:R.Rater, /// reml nolrtest nostderr variance nolog iccem icc21 Score Rater Application if Rater != 4 exit * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: reliability with -icc- and -estat icc-***From:*"JVerkuilen (Gmail)" <jvverkuilen@gmail.com>

- Prev by Date:
**Re: st: inserting new observations between two consecutive observations** - Next by Date:
**Re: st: cluster** - Previous by thread:
**Re: st: reliability with -icc- and -estat icc-** - Next by thread:
**Re: st: reliability with -icc- and -estat icc-** - Index(es):