Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Data consistency heuristics


From   "Sergiy Radyakin" <serjradyakin@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: Data consistency heuristics
Date   Tue, 7 Oct 2008 17:02:21 -0400

Hello All,

this is more or less general question, not related to Stata itself,
but to data processing. I wonder if anyone could point me to a good
source of heuristics/rules on checking the data for
consistency/plausibility. I am looking for something like:

* age of a person must be within the range 0-120
* gender must have no more than 2 unique values
* person younger than NNN years may not be a mother
* if a person is reporting not working, wage must be missing/zero
* if a person is attending primary school, occupation may not be "manager"
* if a person is attending university, [s]he may not report being illiterate
etc

Note that these are more or less flexible rules and there might be
exceptions. But if it is valid for 99% of cases - it's what I am
looking for.

The context topics include economics
(employment/earnings/wages/sector/hours of work etc), education(years
of educ/enrollment/completion), family structure and composition, and
other related topics commonly found in family, household or labor
force surveys.

I believe a significant amount of such checks is being done by data
collectors before releasing the data to public, and I wouldn't want to
reinvent the wheel here.

Thank you, Sergiy Radyakin
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index