Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Re: correcting data inconsistencies

 From Rebecca Pope To statalist@hsphsun2.harvard.edu Subject Re: st: Re: correcting data inconsistencies Date Mon, 11 Mar 2013 09:35:47 -0500

```Nick has given you a solution to identifying your problem panels. I
think this is a good first-step in the process so you have an idea of
"correcting" the values though because you don't know what is correct.
You really have to make a judgement call and then document your
decision rules. For example, you could decide on the modal value:

bys personid: egen myeduc = mode(educ), minmode

Alternately, whether the individual reported 11, 12, or 13, they
definitely had 11 (erring on the lower side):

bys id: egen myeduc = min(educ)

You could also "carry forward" the higher values. I.e. once a person
has attained 12 years, (s)he has 12 years until acquiring the next
year of education. One has to wonder how valuable years 12 & 13 were
if they are forgotten so quickly though. :-)

bys personid educ (year): gen change = (_n==1)
bys personid (year): replace change= 0 if educ < educ[_n-1]
bys personid (year): gen myedu = sum(cond(_n==1,educ,change))

These are just a few of the rules I can think of off the top of my
head. I'd certainly check to see if there is a common approach in the
educational research literature (or wherever you intend to publish) if
for no other reason than that you're less likely to get slammed by a
reviewer.

Regards,
Rebecca

On Mon, Mar 11, 2013 at 8:43 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> The simple program is called Stata....
>
> However, you have to tell it what you regard as inconsistent.
>
> In the case, you could flag any observation that doesn't have a higher
> -education- value than the previous observation in the same panel.
>
> bysort personid (year) : gen flag1 = educ[_n+1] <= educ
> by personid : gen flag2 = educ <= educ[_n-1]
>
> list if flag1 | flag2
>
> You could also flag panels, like that
>
> gen problem = 0
> bysort personid (year) : replace problem = sum(educ <= educ[_n-1]) if _n > 1
> by personid : replace problem = problem[_N]
>
> edit if problem
>
> Fluency with -by:- gets you a long way.
>
> SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step by: step
>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>         Q1/02   SJ 2(1):86--102                                  (no commands)
>         explains the use of the by varlist : construct to tackle
>         a variety of problems with group structure, ranging from
>         simple calculations for each of several groups to more
>         advanced manipulations that use the built-in _n and _N
>
> http://www.stata-journal.com/article.html?article=pr0004 leads to a .pdf.
>
> Nick
>
> On Mon, Mar 11, 2013 at 1:31 PM, David Jose <davidjosework@gmail.com> wrote:
>
>> I would like to correct self-reported data inconsistencies in a panel
>> data set. For example, if there is an education variable, which is
>> reported 5 times, say as follows:
>>
>> year     educ
>>
>> 2000     12
>>
>> 2002     11
>>
>> 2004     13
>>
>> 2006     12
>>
>> 2008     11
>>
>> I wonder if anyone has a simple program that can be implemented to
>> correct such inconsistencies. Thanks in advance.
>>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```