Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Rebecca Pope <rebecca.a.pope@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Re: correcting data inconsistencies |

Date |
Mon, 11 Mar 2013 09:35:47 -0500 |

Nick has given you a solution to identifying your problem panels. I think this is a good first-step in the process so you have an idea of how wide-spread your problem is. There isn't a simple solution to "correcting" the values though because you don't know what is correct. You really have to make a judgement call and then document your decision rules. For example, you could decide on the modal value: bys personid: egen myeduc = mode(educ), minmode Alternately, whether the individual reported 11, 12, or 13, they definitely had 11 (erring on the lower side): bys id: egen myeduc = min(educ) You could also "carry forward" the higher values. I.e. once a person has attained 12 years, (s)he has 12 years until acquiring the next year of education. One has to wonder how valuable years 12 & 13 were if they are forgotten so quickly though. :-) bys personid educ (year): gen change = (_n==1) bys personid (year): replace change= 0 if educ < educ[_n-1] bys personid (year): gen myedu = sum(cond(_n==1,educ,change)) These are just a few of the rules I can think of off the top of my head. I'd certainly check to see if there is a common approach in the educational research literature (or wherever you intend to publish) if for no other reason than that you're less likely to get slammed by a reviewer. Regards, Rebecca On Mon, Mar 11, 2013 at 8:43 AM, Nick Cox <njcoxstata@gmail.com> wrote: > The simple program is called Stata.... > > However, you have to tell it what you regard as inconsistent. > > In the case, you could flag any observation that doesn't have a higher > -education- value than the previous observation in the same panel. > > bysort personid (year) : gen flag1 = educ[_n+1] <= educ > by personid : gen flag2 = educ <= educ[_n-1] > > list if flag1 | flag2 > > You could also flag panels, like that > > gen problem = 0 > bysort personid (year) : replace problem = sum(educ <= educ[_n-1]) if _n > 1 > by personid : replace problem = problem[_N] > > edit if problem > > Fluency with -by:- gets you a long way. > > SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move step by: step > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox > Q1/02 SJ 2(1):86--102 (no commands) > explains the use of the by varlist : construct to tackle > a variety of problems with group structure, ranging from > simple calculations for each of several groups to more > advanced manipulations that use the built-in _n and _N > > http://www.stata-journal.com/article.html?article=pr0004 leads to a .pdf. > > Nick > > On Mon, Mar 11, 2013 at 1:31 PM, David Jose <davidjosework@gmail.com> wrote: > >> I would like to correct self-reported data inconsistencies in a panel >> data set. For example, if there is an education variable, which is >> reported 5 times, say as follows: >> >> year educ >> >> 2000 12 >> >> 2002 11 >> >> 2004 13 >> >> 2006 12 >> >> 2008 11 >> >> I wonder if anyone has a simple program that can be implemented to >> correct such inconsistencies. Thanks in advance. >> > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Re: correcting data inconsistencies***From:*njcoxstata@gmail.com

**References**:**st: correcting data inconsistencies***From:*David Jose <davidjosework@gmail.com>

**st: Re: correcting data inconsistencies***From:*David Jose <davidjosework@gmail.com>

**Re: st: Re: correcting data inconsistencies***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: exporting correlation tables to Ms-Words** - Next by Date:
**Re: st: Re: correcting data inconsistencies** - Previous by thread:
**Re: st: Re: correcting data inconsistencies** - Next by thread:
**Re: st: Re: correcting data inconsistencies** - Index(es):