 Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Interrater agreement: finding the problematic items

 From Nick Cox To "statalist@hsphsun2.harvard.edu" Subject Re: st: Interrater agreement: finding the problematic items Date Fri, 14 Jun 2013 18:53:17 +0100

```Many people seem unaware of the simplicity and generality of various
measures of inequality, diversity and concentration. (There are many
other names.) They may be under the impression that they are rather
odd and ad hoc measures used by people in rather odd and ad hoc fields
such as economics, sociology or ecology.

Here are a few examples of two such measures done calculator-style.
All we are assuming is a set of categories, not even ordered, not even
numbered, just labelled.

(There are many, many others, but I like these two measures.)

For a change,

. sysuse auto, clear
(1978 Automobile Data)

. tab rep78, matcell(f_rep)

Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
1 |          2        2.90        2.90
2 |          8       11.59       14.49
3 |         30       43.48       57.97
4 |         18       26.09       84.06
5 |         11       15.94      100.00
------------+-----------------------------------
Total |         69      100.00

. tab foreign, matcell(f_for)

Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
Domestic |         52       70.27       70.27
Foreign |         22       29.73      100.00
------------+-----------------------------------
Total |         74      100.00

The stages are

1. Copy the vectors of frequencies into vectors in Mata.
2. Scale to vectors of probabilities.

3. The sum of squared probabilities is a measure of agreement.
Everyone agrees => every one is in one category. One probability is 1
and the others are 0, so sum is 1. Lower limit is 0 (not reached in
practice.)

This measure, or a relative of it, is variously named for, or
attributed to Gini, Turing,  Hirschman, Simpson, Herfindahl, Good and
no doubt others.

4. The reciprocal of this has a nice interpretation as "the equivalent
number of equally common categories".

5. The weighted mean of the log reciprocal probabilities is often
known as the entropy. If is often named for Shannon (occasionally for
Weaver as well) and/or Wiener. (Weaver and Wiener were precisely two
distinct people, but under conditions of lax spelling standards some
students have known to attempt to merge them retrospectively.)

6. Exponentiating that gives a number with a nice interpretation as
"the equivalent number of equally known categories" (another estimate
thereof).

. mata
------------------------------------------------- mata (type end to
exit) -----------
: f1 = st_matrix("f_rep")

: f1
1
+------+
1 |   2  |
2 |   8  |
3 |  30  |
4 |  18  |
5 |  11  |
+------+

: p1 = f1 :/ sum(f1)

: p1
1
+---------------+
1 |  .0289855072  |
2 |   .115942029  |
3 |  .4347826087  |
4 |  .2608695652  |
5 |  .1594202899  |
+---------------+

: p1:^2
1
+---------------+
1 |  .0008401596  |
2 |  .0134425541  |
3 |  .1890359168  |
4 |  .0680529301  |
5 |  .0254148288  |
+---------------+

: sum(p1:^2)
.2967863894

: 1/sum(p1:^2)
3.369426752

: sum(p1 :* ln(1:/p1))
1.357855957

: exp(sum(p1 :* ln(1:/p1)))
3.887848644

:
: f2 = st_matrix("f_rep")

: f2
1
+------+
1 |   2  |
2 |   8  |
3 |  30  |
4 |  18  |
5 |  11  |
+------+

: p2 = f2 :/ sum(f2)

: p2
1
+---------------+
1 |  .0289855072  |
2 |   .115942029  |
3 |  .4347826087  |
4 |  .2608695652  |
5 |  .1594202899  |
+---------------+

: p2:^2
1
+---------------+
1 |  .0008401596  |
2 |  .0134425541  |
3 |  .1890359168  |
4 |  .0680529301  |
5 |  .0254148288  |
+---------------+

: sum(p2:^2)
.2967863894

: 1/sum(p2:^2)
3.369426752

: sum(p2 :* ln(1:/p2))
1.357855957

: exp(sum(p2 :* ln(1:/p2)))
3.887848644

:
: end
-------------------------------------------------------------------------------------
Nick
njcoxstata@gmail.com

On 14 June 2013 16:34, Nick Cox <njcoxstata@gmail.com> wrote:
>
> Some Statalist members are well versed in psychometrics but I see no
> reason why more general statistical ideas should not relevant too. The
> standard deviation of ratings for each item would be one measure of
> disagreement. Perhaps better ones would be the sum of squared
> probabilities or the entropy of the probability distribution for the
> rating.
> Nick
> njcoxstata@gmail.com
>
>
On 14 June 2013 16:11, Ilian, Henry (ACS) <Henry.Ilian@dfa.state.ny.us> wrote:

>> I'm doing an interrater agreement study on a case-reading instrument. There are five reviewers using an instrument with 120 items. The ratings scales are ordinal with either two, three or four options. I'm less interested in reviewer tendencies than I am in problematic items, those with high levels of disagreement.
>>
>> Most of the interrater agreement/interrater reliability statistics look at reviewer tendencies. I can see two ways of getting at agreement on items. The first is to sum all the differences between all possible pairs of reviewers, and those with the highest totals are the ones to examine. The other is Chronbach's alpha. Is there any strong argument for or against either approach, and is there a different approach that would be better than these?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```