Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Kappa programming challenge...

From   "Nick Cox" <>
To   <>
Subject   st: RE: Kappa programming challenge...
Date   Wed, 19 May 2004 13:37:54 +0100

There is a specific question here of whether this program 
can be speeded up, and a quick glance suggests "Not much". 
You can do things like 


as illustrated below, to answer a repeated comment
in your version, but that's trivial in speed terms. 

program stekappa
	version 8 
	display c(current_time)
	local ndocs = 10 
	local ndrugs = 6 
	local w = 0 
	local npair = 0  
	forval np = `=`ndocs' -1'(-1)1 {
		local npair =`npair' + `np' 
	matrix rkappa = J(`npair',`ndrugs',0)
	qui forval i = `ndocs'(-1)2 { 
		forval j = 1/`=`i'-1'{
			local ++w
			forval d = 1/`ndrugs' { 
			local di = `i' + ((`d' - 1) * `ndocs') 
			local dj = `j' + ((`d' - 1) * `ndocs') 
			kap v`di' v`dj' in 2/26, wgt(w) absolute 
			matrix rkappa[`w',`d'] = r(kappa) 
	svmat rkappa 
	means rkappa1 - rkappa`ndrugs' 
	display c(current_time)

But lurking behind there is a larger question of how to 
analyse these data, which depends naturally but not completely 
on what is important to you. 

In what sense can you justify this rather ad hoc 
averaging of kappas across different weighting 
schemes? It seems to me that it is only too likely 
to obscure much of the information in these data. 
Of course, you will also be trying other analyses not 
mentioned here. 

If I were the only person on a desert island 
with Stata (internet connection allowed) and someone 
brought these data to me, I'd want to look at 
a matrix of measures of agreement between 
doctors which was for drugs X attributes. For a 
7 point ordered scale, I am not sure that kappa 
is the most useful measure. Somers' d might be 
useful (cue Roger Newson). 

> I have a dataset containing the results of a survey asking 
> doctors to rate 6
> drugs on 24 separate attributes (each doctor therefore 
> provides answers to
> 24*6=144 answers). All ratings are on a 1 to 7 scale with end 
> anchors along
> the lines of 'rubbish' and 'brilliant'.
> I've managed to manipulate my data so that I can apply 
> Stata's 'Kap' command
> to calculate Kappa for each drug separately across all 
> doctors. I can then
> compare these results across the drugs to see which has achieved the
> greatest doctor consensus. My problem is that in calculating 
> these kappa
> values (because there are more than two raters) Stata has not 
> allowed me to
> apply the 'wgt(w)' and 'absolute' options and consequently in 
> calculating
> Kappa (I believe) if one doctor scores a variable 1 and another doctor
> scores the same variable  7 then this is no different to the doctors'
> scoring a close 6 and 7 in terms of calculating Kappa. 
> The upshot is as it stands I don't think it's worth reporting 
> these Kappa
> scores. 
> If I have manageable number of doctors I could try and form 
> every possible
> pairwise doctor combination possible, calculate kappa for 
> each such two way
> doctor combination (using the  'wgt(w)' and 'absolute' 
> options) and then
> average these Kappa results to give a representative value.
> I've written the code below to accomplish this. There are ten 
> doctors, 6
> drugs - this makes 60 variables (v1 through v60). 
> Observations represent
> attributes (therefore 24 observations). My code works because 
> my data is
> such that the variables v1 - v10 hold the 10 doctors scores 
> on the first
> drug (one variable for each doctor); v11-v20 hold the same 
> doctors score on
> the second drug,  and so on.
> program stekappa
> display c(current_time)
> local ndocs = 10 //this and next line are the only two user 
> input lines
> local ndrugs 6 //number of drugs
> local ndocs1 = `ndocs'-1  //need this as forvalues loop won't 
> accept start
> value `ndocs'-1
> local w=0 //w will be row indicator used to input results in matrix
> local npair =0 //will hold number of exhaustive paired doctor 
> comparisons 
> forvalues np = `ndocs1'(-1)1{
> local npair =`npair' + `np' //calculate number of exhaustive 
> paired doctor
> comparisons
> }
> matrix rkappa=J(`npair',`ndrugs',0) //initiate matrix to hold 
> results to
> zero
> forvalues i =`ndocs'(-1)2{ //this with next forvalues loop 
> generates all
> exhaustive doc pairs
> local k=`i'-1 //need this as forvalues loop won't accept end 
> value `i'-1
> forvalues j=1(1)`k'{
> local ++w
> forvalues d=1(1)`ndrugs'{ //this used to cycle through the drugs
> local di = `i'+ ((`d'-1)*`ndocs') //applies given dataset 
> variable setup see
> email above
> local dj = `j'+ ((`d'-1)*`ndocs') //second variable for use 
> in Kap command
> below
> quietly kap v`di' v`dj' in 2/26, wgt(w) absolute //run the 
> kap command for
> the paired docs
> matrix rkappa[`w',`d'] = r(kappa) //save the //store the 
> kappa value in the
> matrix
> }
> }
> }
> svmat rkappa //convert the columns of the matrix into data variables
> means rkappa1 - rkappa`ndrugs' //find the means of these data 
> variables
> display c(current_time)
> end
> I'm new to programming Stata and I think my code could be 
> written to be
> quite a bit faster. Any advice on how to speed it up would be most
> appreciated. This is important to me as I'd like it to be 
> able to cope with
> say 100 doctors.
> What I'd really like though is if someone is clever enough to 
> write a good
> sampling scheme program to deal with even larger numbers of 
> doctors, when
> computing every single paired doctor comparison isn't 
> possible. This does
> not look easy to me as the way the data looks like it has to 
> be set up to
> make use of the kap command (with the wgt(w) option), I need 
> to sample on
> variables not observations.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index