David At 3:32 PM +0100 10/8/08, Nick Cox wrote:

I too have wanted to find a theory of data cleaning, but in practice it's mightily elusive. I think this is the most bottom-up part of statistical science in which at best you have rules that work most of the time for your kind of data. A colleague worked with records on glaciers which supposedly had been reviewed very carefully. He found many things that the quality control had missed, including glaciers that were just in the wrong places, as shown by a scatter of latitude and longitude; glaciers reported twice, by different countries; and many Russian glaciers reported to face East when they faced West and vice versa. (Apparently, Sergiy, that was a transliteration/translation problem.) He found these things by slow scrutiny and started building up ad hoc a list of things that could be wrong.

