Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Data consistency heuristics


From   Maarten buis <maartenbuis@yahoo.co.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Data consistency heuristics
Date   Tue, 7 Oct 2008 23:06:28 +0100 (BST)

An alternative or complementary approach would be to use -assert-, as
is advocated in (Gould 2003)

William Gould (2003) Stata tip 3: How to be assertive. The Stata
Journal, 3(4): 448. 
http://www.stata-journal.com/article.html?article=dm0003

Hope this helps,
Maarten

--- "Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu> wrote:

> A simple do file should work.
> 
> display caseno age if age<0 or age>120 & age<.  // may want to print
> missing ages
> display caseno gender if gender~=a | gender ~=b  // a and b are the
> unique values (could be strings so you'd want to fix that up)
> diplay caseno if age<NNN and mother==1  // mother is an indicator
> etc.
> 
> An interesting question is whether you want to correct these - e.g.
> convert them to missing or an error code  (I first typed coed - but
> that's NOT what I meant!)
> In a study earlier this summer I did just this.  Initially I printed
> all
> the missing value cases, but the data came from medical records and
> about half of 2000 cases were missing, so I simply didn't print, but
> gave a count for each variable.
> Some of the variables had many possible legal values (e.g., which of
> 30
> drugs were being taken), so the checking became very complicated -
> especially when the dosage and schedule were being checked.  
> 
> Svend Juul has a nice chapter on this in his book.
> 
> Tony
> 
> Peter A. Lachenbruch
> Department of Public Health
> Oregon State University
> Corvallis, OR 97330
> Phone: 541-737-3832
> FAX: 541-737-4001
> 
> 
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Sergiy
> Radyakin
> Sent: Tuesday, October 07, 2008 2:02 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: Data consistency heuristics
> 
> Hello All,
> 
> this is more or less general question, not related to Stata itself,
> but to data processing. I wonder if anyone could point me to a good
> source of heuristics/rules on checking the data for
> consistency/plausibility. I am looking for something like:
> 
> * age of a person must be within the range 0-120
> * gender must have no more than 2 unique values
> * person younger than NNN years may not be a mother
> * if a person is reporting not working, wage must be missing/zero
> * if a person is attending primary school, occupation may not be
> "manager"
> * if a person is attending university, [s]he may not report being
> illiterate
> etc
> 
> Note that these are more or less flexible rules and there might be
> exceptions. But if it is valid for 99% of cases - it's what I am
> looking for.
> 
> The context topics include economics
> (employment/earnings/wages/sector/hours of work etc), education(years
> of educ/enrollment/completion), family structure and composition, and
> other related topics commonly found in family, household or labor
> force surveys.
> 
> I believe a significant amount of such checks is being done by data
> collectors before releasing the data to public, and I wouldn't want
> to
> reinvent the wheel here.
> 
> Thank you, Sergiy Radyakin
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


      
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index