Mike Lacy <Michael.Lacy@colostate.edu>

statalist@hsphsun2.harvard.edu

st: Re: using Stata to detect interviewer fraud

Sat, 01 May 2010 13:10:25 -0600

>Date: Fri, 30 Apr 2010 23:16:14 -0400 >From: "Michelson, Ethan" <emichels@indiana.edu >Subject: st: using Stata to detect interviewer fraud

clear all // Create some simulated questionnaire data to work on. set obs 505 local nvars = 100 // number of variables local ncat = 2 // number of response categories for each variable forval i = 1/`nvars' { gen byte q`i' = 1 + trunc(runiform() * `ncat') } //

// matches across a list of variables. This is essentially a replacement // for -matrix dissim-, which can only do matching coefficients for // binary variables // mata mata clear mata: void mat_match /// (string varlist, // list of variables across which to match string scalar stmatname) // name of Stata matrix for results // {

nsubj = rows(X) nvar = cols(X) M = J(nsubj, nsubj, 0) for (j = 1; j <= nvar; j++ ) { for (ego = 1; ego <=nsubj; ego++) { for (alter = 1; alter <= nsubj; alter++) { if (X[ego,j] == X[alter,j]) { M[ego,alter] = M[ego,alter] + 1 } } } } M = M/nvar // proportion st_matrix(stmatname,M) } end // //

// proportions in Stata matrix "M" quiet unab varlist: q* mata: mat_match("`varlist'", "M") // // Inspect the matching matrix to find excessive matches. This could // be included in the Mata program, but I only need the matrix. Cases // here are ID'd by case number, not by a true id number. clear svmat M gen str HighMatch = "" local toomuch = 0.8 foreach M of varlist M* { quiet replace HighMatch = HighMatch + "`M'" + " " if (`M' > `toomuch') } edit HighMatch Regards, Mike Lacy Dept. of Sociology Colorado State University

