[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
David Radwin <radwin@berkeley.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: RE: Data consistency heuristics |

Date |
Wed, 8 Oct 2008 09:14:31 -0700 |

David At 3:32 PM +0100 10/8/08, Nick Cox wrote:

I too have wanted to find a theory of data cleaning, but in practice it's mightily elusive. I think this is the most bottom-up part of statistical science in which at best you have rules that work most of the time for your kind of data. A colleague worked with records on glaciers which supposedly had been reviewed very carefully. He found many things that the quality control had missed, including glaciers that were just in the wrong places, as shown by a scatter of latitude and longitude; glaciers reported twice, by different countries; and many Russian glaciers reported to face East when they faced West and vice versa. (Apparently, Sergiy, that was a transliteration/translation problem.) He found these things by slow scrutiny and started building up ad hoc a list of things that could be wrong.

-- David Radwin // radwin@berkeley.edu Office of Student Research, University of California, Berkeley * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Data consistency heuristics***From:*"Sergiy Radyakin" <serjradyakin@gmail.com>

**st: RE: Data consistency heuristics***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**RE: st: How to label the value 999999999999** - Next by Date:
**Re: st: restrict subjects for analysis from multi-record data** - Previous by thread:
**st: RE: Data consistency heuristics** - Next by thread:
**Re: st: RE: Data consistency heuristics** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |