Govind Bell Acharya <ga47@cornell.edu>: I assume you saw the response from Nick Cox. Assuming you want variables named a-f as below,, and you've got an interviewer id/name variable a on some data with a variable q which is the reponse to an item, where missing is appropriately coded (. or .a through .z), and surveyid indexing each survey, you can bys a surveyid: gen fq=_n==1 g miss=mi(q) bys a: g d=_N collapse (sum) b=fq (sum) c=miss d, by(a) gen e=c/d You can get the mean over some group of interviewers with egen mpcmi=mean(e), by(groupvar) though if you want the mean over everyone, so that f is the same for everyone, you should just su e, meanonly gen f=e-r(mean) See also -help collapse- and http://www.stata.com/support/faqs/data/weighted.html among other resources. On 7/21/07, Govind Bell Acharya <ga47@cornell.edu> wrote:

For our research, we use telephone interviewers to conduct a number of surveys. At the moment, it is a challenge to detect whether the number of items where the interviewer coded the missing data (don't know or refused options) is above or below the overall mean of missing values. I did something like that using the proc sql command in SAS, but it is (as SAS is in general), extremely unwieldy and creates major issues such as (f) below. In any case, here is what I have in mind (a)Name (b)# surveys complete (c)# missing (d)# questions asked (e) (c)/(d) (f) [sum of (c)]/[sum of (d)] (g) [(e)-(f)]

