-Steve On Oct 7, 2008, at 6:05 PM, Maarten buis wrote:

An alternative or complementary approach would be to use -assert-, as is advocated in (Gould 2003) William Gould (2003) Stata tip 3: How to be assertive --- "Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu> wrote:A simple do file should work. display caseno age if age<0 or age>120 & age<. // may want to print missing ages display caseno gender if gender~=a | gender ~=b // a and b are the unique values (could be strings so you'd want to fix that up) diplay caseno if age<NNN and mother==1 // mother is an indicator etc. An interesting question is whether you want to correct these - e.g. convert them to missing or an error code (I first typed coed - but that's NOT what I meant!) In a study earlier this summer I did just this. Initially I printed all the missing value cases, but the data came from medical records and about half of 2000 cases were missing, so I simply didn't print, but gave a count for each variable. Some of the variables had many possible legal values (e.g., which of 30 drugs were being taken), so the checking became very complicated - especially when the dosage and schedule were being checked. Svend Juul has a nice chapter on this in his book. Tony Peter A. Lachenbruch Department of Public Health Oregon State University Corvallis, OR 97330 Phone: 541-737-3832 FAX: 541-737-4001 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Sergiy Radyakin Sent: Tuesday, October 07, 2008 2:02 PM To: statalist@hsphsun2.harvard.edu Subject: st: Data consistency heuristics Hello All, this is more or less general question, not related to Stata itself, but to data processing. I wonder if anyone could point me to a good source of heuristics/rules on checking the data for consistency/plausibility. I am looking for something like: * age of a person must be within the range 0-120 * gender must have no more than 2 unique values * person younger than NNN years may not be a mother * if a person is reporting not working, wage must be missing/zero * if a person is attending primary school, occupation may not be "manager" * if a person is attending university, [s]he may not report being illiterate etc Note that these are more or less flexible rules and there might be exceptions. But if it is valid for 99% of cases - it's what I am looking for. The context topics include economics (employment/earnings/wages/sector/hours of work etc), education(years of educ/enrollment/completion), family structure and composition, and other related topics commonly found in family, household or labor force surveys. I believe a significant amount of such checks is being done by data collectors before releasing the data to public, and I wouldn't want to reinvent the wheel here. Thank you, Sergiy Radyakin * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/----------------------------------------- Maarten L. Buis Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands visiting address: Buitenveldertselaan 3 (Metropolitan), room N515 +31 20 5986715 http://home.fsw.vu.nl/m.buis/ ----------------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

