Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: using Stata to detect interviewer fraud |

Date |
Sat, 1 May 2010 10:38:31 -0500 |

On Fri, Apr 30, 2010 at 10:16 PM, Michelson, Ethan <emichels@indiana.edu> wrote: > I'd be deeply grateful for help writing a more efficient, more parsimonious .do file to help detect interviewer fraud. After completing a survey of 2,500 households, I discovered that a few interviewers copied each others' questionnaires. I decided to write some code that calculates the proportion of all nonmissing questionnaire items that are identical across every other questionnaire. Although my .do file accomplishes this task, I strongly suspect I'm making Stata do tons of unnecessary work. It takes Stata about 12 hours to process 505 questionnaires (from a single survey site, since I can rule out the possibility that interviewers conspired across different survey sites)..... I imagine the tasks like comparing the data between rows are better accomplished by -cluster- routines. They aren't lightning fast, either, but I believe they are better optimized for speed than your code is, but still have the same functionality of determining how "close" the two sets of responses are. -cluster- works great if you have either all continuous or all discrete data; if you have a mix, or if you have missing data, the choices are more limited, and you'd need to come up with more inventive ways of computing the differences between the completed questionnaires. That'll make a terrific Stata conference talk, you know :)). I don't remember seeing anything like that at the US meetings. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: Re-re-post: Stata 11 - Factor variables in a regression command** - Next by Date:
**Re: st: random allocation three-way cross-over design** - Previous by thread:
**st: RE: Need help with -collapse- or a better solution** - Next by thread:
**Re: st: using Stata to detect interviewer fraud** - Index(es):