Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Rebecca Pope <rebecca.a.pope@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Re: correcting data inconsistencies |
Date | Mon, 11 Mar 2013 09:35:47 -0500 |
Nick has given you a solution to identifying your problem panels. I think this is a good first-step in the process so you have an idea of how wide-spread your problem is. There isn't a simple solution to "correcting" the values though because you don't know what is correct. You really have to make a judgement call and then document your decision rules. For example, you could decide on the modal value: bys personid: egen myeduc = mode(educ), minmode Alternately, whether the individual reported 11, 12, or 13, they definitely had 11 (erring on the lower side): bys id: egen myeduc = min(educ) You could also "carry forward" the higher values. I.e. once a person has attained 12 years, (s)he has 12 years until acquiring the next year of education. One has to wonder how valuable years 12 & 13 were if they are forgotten so quickly though. :-) bys personid educ (year): gen change = (_n==1) bys personid (year): replace change= 0 if educ < educ[_n-1] bys personid (year): gen myedu = sum(cond(_n==1,educ,change)) These are just a few of the rules I can think of off the top of my head. I'd certainly check to see if there is a common approach in the educational research literature (or wherever you intend to publish) if for no other reason than that you're less likely to get slammed by a reviewer. Regards, Rebecca On Mon, Mar 11, 2013 at 8:43 AM, Nick Cox <njcoxstata@gmail.com> wrote: > The simple program is called Stata.... > > However, you have to tell it what you regard as inconsistent. > > In the case, you could flag any observation that doesn't have a higher > -education- value than the previous observation in the same panel. > > bysort personid (year) : gen flag1 = educ[_n+1] <= educ > by personid : gen flag2 = educ <= educ[_n-1] > > list if flag1 | flag2 > > You could also flag panels, like that > > gen problem = 0 > bysort personid (year) : replace problem = sum(educ <= educ[_n-1]) if _n > 1 > by personid : replace problem = problem[_N] > > edit if problem > > Fluency with -by:- gets you a long way. > > SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move step by: step > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox > Q1/02 SJ 2(1):86--102 (no commands) > explains the use of the by varlist : construct to tackle > a variety of problems with group structure, ranging from > simple calculations for each of several groups to more > advanced manipulations that use the built-in _n and _N > > http://www.stata-journal.com/article.html?article=pr0004 leads to a .pdf. > > Nick > > On Mon, Mar 11, 2013 at 1:31 PM, David Jose <davidjosework@gmail.com> wrote: > >> I would like to correct self-reported data inconsistencies in a panel >> data set. For example, if there is an education variable, which is >> reported 5 times, say as follows: >> >> year educ >> >> 2000 12 >> >> 2002 11 >> >> 2004 13 >> >> 2006 12 >> >> 2008 11 >> >> I wonder if anyone has a simple program that can be implemented to >> correct such inconsistencies. Thanks in advance. >> > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/