[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Fraud methods in Stata

From   Nick Cox <>
Subject   Re: st: Fraud methods in Stata
Date   Fri, 26 Sep 2008 11:30:44 -0500

I would search for publications of longstanding Stata user Stephen Evans in this area. He has done very serious work on (possibly fraudulent) medical data.

That said, I remain puzzled by the implication that outliers are prima facie evidence of fraud. My own impression is that fraudulent people wish to create datsets that look genuine and that they are thus unlikely to add or manufacture outliers, unless those outliers serve their purpose somehow, but that's just a guess. The main ways in which I can think of that fraudulent data can sometimes be identified is that often agreement is "too good to be true" and through looking at the patterns of first and last digits in data. Another obviously related issue is plagiarism of published data.


Williams, Rachael wrote:

I am considering methods of detecting fraud in a hypothetical clinical
trial with a large number of centres, but only a few patients per
In addition, many variables will be binary.

Would Cook's D be appropriate here?
Is it possible to calculate Mahalanobis' distance in Stata in order to
detect (possibly fraudulent) inliers, outliers and near duplicates in a

If anyone has any ideas of other ways to detect possible fraud I would
love to hear from you too!
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index