Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: correcting data inconsistencies


From   njcoxstata@gmail.com
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Re: correcting data inconsistencies
Date   Mon, 11 Mar 2013 14:45:36 +0000

Good advice.

David: A quite separate detail in this example is presumption that we all understand what the codes mean. I was guessing at some system in which increasing integers imply annual progress, but it's best not to presume on an international list that every country has the same codes as yours.

Nick
njcoxstata@gmail.com

On 11 Mar 2013, at 14:35, Rebecca Pope <rebecca.a.pope@gmail.com> wrote:

> Nick has given you a solution to identifying your problem panels. I
> think this is a good first-step in the process so you have an idea of
> how wide-spread your problem is. There isn't a simple solution to
> "correcting" the values though because you don't know what is correct.
> You really have to make a judgement call and then document your
> decision rules. For example, you could decide on the modal value:
> 
> bys personid: egen myeduc = mode(educ), minmode
> 
> Alternately, whether the individual reported 11, 12, or 13, they
> definitely had 11 (erring on the lower side):
> 
> bys id: egen myeduc = min(educ)
> 
> You could also "carry forward" the higher values. I.e. once a person
> has attained 12 years, (s)he has 12 years until acquiring the next
> year of education. One has to wonder how valuable years 12 & 13 were
> if they are forgotten so quickly though. :-)
> 
> bys personid educ (year): gen change = (_n==1)
> bys personid (year): replace change= 0 if educ < educ[_n-1]
> bys personid (year): gen myedu = sum(cond(_n==1,educ,change))
> 
> These are just a few of the rules I can think of off the top of my
> head. I'd certainly check to see if there is a common approach in the
> educational research literature (or wherever you intend to publish) if
> for no other reason than that you're less likely to get slammed by a
> reviewer.
> 
> Regards,
> Rebecca
> 
> On Mon, Mar 11, 2013 at 8:43 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> The simple program is called Stata....
>> 
>> However, you have to tell it what you regard as inconsistent.
>> 
>> In the case, you could flag any observation that doesn't have a higher
>> -education- value than the previous observation in the same panel.
>> 
>> bysort personid (year) : gen flag1 = educ[_n+1] <= educ
>> by personid : gen flag2 = educ <= educ[_n-1]
>> 
>> list if flag1 | flag2
>> 
>> You could also flag panels, like that
>> 
>> gen problem = 0
>> bysort personid (year) : replace problem = sum(educ <= educ[_n-1]) if _n > 1
>> by personid : replace problem = problem[_N]
>> 
>> edit if problem
>> 
>> Fluency with -by:- gets you a long way.
>> 
>> SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step by: step
>>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>>        Q1/02   SJ 2(1):86--102                                  (no commands)
>>        explains the use of the by varlist : construct to tackle
>>        a variety of problems with group structure, ranging from
>>        simple calculations for each of several groups to more
>>        advanced manipulations that use the built-in _n and _N
>> 
>> http://www.stata-journal.com/article.html?article=pr0004 leads to a .pdf.
>> 
>> Nick
>> 
>> On Mon, Mar 11, 2013 at 1:31 PM, David Jose <davidjosework@gmail.com> wrote:
>> 
>>> I would like to correct self-reported data inconsistencies in a panel
>>> data set. For example, if there is an education variable, which is
>>> reported 5 times, say as follows:
>>> 
>>> year     educ
>>> 
>>> 2000     12
>>> 
>>> 2002     11
>>> 
>>> 2004     13
>>> 
>>> 2006     12
>>> 
>>> 2008     11
>>> 
>>> I wonder if anyone has a simple program that can be implemented to
>>> correct such inconsistencies. Thanks in advance.
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index