Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: using Stata to detect interviewer fraud

From	Robert Picard <[email protected]>
To	[email protected]
Subject	Re: st: using Stata to detect interviewer fraud
Date	Sat, 1 May 2010 18:49:54 -0400

Here's a quick and simple way to do it. It does not distinguish
missing values but that should be easy to adjust. If I look for cars
that are the same for 70% or more variables, I find that the Dodge
Diplomat is very similar to the Dodge Magnum.

Hope this helps,

Robert

*--------------------------- begin example -----------------------
version 11

clear all
sysuse auto
unab vlist: *
gen id1 = _n
tempfile f
qui save "`f'"

rename id1 id2
cross using "`f'"
gen diffid = id1 != id2
sort id1 diffid id2
gen nmatch = 0
foreach v in `vlist' {
	qui by id1: replace nmatch = nmatch + (`v'[1] == `v')
}

by id1: gen similar = nmatch / nmatch[1] > .7
by id1: egen check = sum(similar)

list id1 id2 make-foreign if check>1 & similar, noobs sepby(id1)
*--------------------- end example --------------------------


On Fri, Apr 30, 2010 at 11:16 PM, Michelson, Ethan <[email protected]> wrote:
> I'd be deeply grateful for help writing a more efficient, more parsimonious .do file to help detect interviewer fraud. After completing a survey of 2,500 households, I discovered that a few interviewers copied each others' questionnaires. I decided to write some code that calculates the proportion of all nonmissing questionnaire items that are identical across every other questionnaire. Although my .do file accomplishes this task, I strongly suspect I'm making Stata do tons of unnecessary work. It takes Stata about 12 hours to process 505 questionnaires (from a single survey site, since I can rule out the possibility that interviewers conspired across different survey sites).....
>
> In the following code, "id" is the unique questionnaire id. There are 505 questionnaires in this batch. The final command at the bottom asks Stata to list combinations of questionnaires with >80% identical content. I have no doubt there's a far more efficient way to do this. I'd really appreciate any advice anyone can offer.
>
> ********************
> sort id
> gen order=0
> gen add=-1
> replace order=1 if _n==1
> levels id, local(levels)
> foreach l of local levels {
>    gen same_`l'=0
>    gen all_`l'=0
> }
> forv n = 1(1)504 {
>    foreach l of local levels {
>       foreach var of varlist a1* a2* a3* b* d* c1 c12 c23 c34 c44 c55 c67 c77 c88 c100 c107 c116 c126 c136 c144 c155 c165 c176 c185 c195 {
>          quietly replace same_`l'=same_`l'+1 if `var'==`var'[_n+`n']&`var'~=.&id[_n+`n']==`l'
>          quietly replace all_`l'=all_`l'+1 if `var'~=.&`var'[_n+`n']~=.&id[_n+`n']==`l'
>          display "`l' `n'"
>      }
>    }
>    quietly replace order=add if order==1
>    quietly replace add=add-1
>    gsort -order id
>    quietly replace order=1 if _n==1
> }
> foreach l of local levels {
>    gen prop_`l'=same_`l'/all_`l'*100
> }
> foreach l of local levels {
>    list id prop_`l' same_`l' all_`l' if prop_`l'>80&prop_`l'<.
> }
>
> ******************
>
> Ethan Michelson
> Departments of Sociology and East Asian Languages & Cultures, Associate Professor
> Maurer School of Law, Associate Professor of Sociology and Law
> mail address:
> Department of Sociology
> Indiana University
> 744 Ballantine Hall
> 1020 E. Kirkwood Ave.
> Bloomington, IN 47405
> Phone: (812) 856-1521
> Fax: (812) 855-0781
> Email: [email protected]
> URL: http://www.indiana.edu/~emsoc/
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: using Stata to detect interviewer fraud
  - From: "Michelson, Ethan" <[email protected]>

Prev by Date: Re: st: My program does not accept more than two args
Next by Date: Re: st: How do I create a new observation that is the sum of two observations?
Previous by thread: Re: st: using Stata to detect interviewer fraud
Next by thread: RE: st: using Stata to detect interviewer fraud
Index(es):
- Date
- Thread