# Re: st: kappa

 From wgould@stata.com (William Gould) To statalist@hsphsun2.harvard.edu Subject Re: st: kappa Date Mon, 24 Jun 2002 07:32:14 -0500

```Mary Zellmer-Bruhn <mzellmer-bruhn@csom.umn.edu> wrote,

> I am interested in calculating kappa for interrater agreement.  I have a
> dataset that contains about 350 individuals (members of about 90 teams).
> The individuals are the rows in the data.  Columns are a set of 13 items on
> which the individuals answered yes (1) or no (0).  I want to calculate the
> interrater agreement on these 13 items.  I think the kappa command should do
> this, but I am not sure about the exact programming necessary.  I'd
> appreciate any insights or comments.

If I understand the question, Mary wishes to calculate Kappa when the are more
than two raters (the 350 individuals), and two possible ratings (the "yes" or
"no" answers).  Kappa will be calculated over the 19 questions.

-kappa- is the command and the syntax for kappa in this case is

kappa pos neg

where variable pos records the number of raters assessing positive ("yes")
and variable neg records the number of raters assessing negative ("no").
In Mary's case, this two-variable dataset would have 19 observations, one for
each question.

Mary's data does not look like that.  She is starting with a 350-observation
data that looks like,

personid   q1   q2   ...    q19
--------------------------------------
1.        100    1    0   ...      1
2.        105    1    1   ...      0
..         ..   ..   ..   ...     ..
350.       4222    0    1   ...      1

Perhaps her dataset contains other variables as well; it does not matter.
Anyway, assuming the variables are named as I have shown them above, here is
what Mary needs to type:

. reshape long q, i(personid) j(qnum)
. sort qnum
. by qnum: gen pos = sum(q==1)
. by qnum: gen neg = sum(q==0)
. by qnum: keep if _n==_N
. kappa pos neg

The key to the solution is the -reshape- command, which allowed me to convert
Mary's wide data to the long form.  After -reshape-, Mary's data looks like:

personid   qnum     q
---------------------------
1.       100      1     1   --+
2.       100      2     0     |  this was previously obs 1
..        ..     ..    ..     |
19.       100     19     1   --+

20.       105      1     1   --+
21.       105      2     1     |  this was previously obs 2
..        ..     ..    ..     |
38.       105     19     0   --+

..        ..     ..    ..

6632.      4422      1     0   --+
6633.      4422      2     1     |  this was previously obs 350
..        ..     ..    ..     |
6650.      4422     19     1   --+

With the data in this form, I can now order it on question number, typing
-sort qnum-, to obtain

personid   qnum     q
---------------------------
1.       100      1     1
2.       105      1     1
..        ..     ..    ..
350.      4422      1     0

351.       100      2     0
352.       105      2     1
..        ..     ..    ..
700.      4422      2     1

..        ..     ..    ..

6301.       100     19     1
6302.       105     19     0
..        ..     ..    ..
6650.       4422    19     1

Actually, I will not be ordered within qnum within personid unless I typed
-sort qnum personid-, but that does not matter.  Type that if you want.
In any case, I am now in position to type the sums of positive and negative
responses, by question number, and produce the desired 19-observation dataset.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```