Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: complex data cleaning issue (well, complex for me)


From   "Eva Poen" <eva.poen@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: complex data cleaning issue (well, complex for me)
Date   Tue, 29 Apr 2008 09:36:26 +0100

Stephen,

the following applies given that you have dealt with the problem of
corrections occurring in multiple records manually.

2008/4/29 Stephen Cox <sd.cox@qut.edu.au>:
>  EXAMPLE A.
>
>  employee#    startdate    startday    enddate    endday    hours    days
>  109123          07Aug07    Monday    09Aug07    Wednesday    21    3
>  109123          07Aug07    Monday    09Aug07    Wednesday    -21    -3
...
>  EXAMPLE B.
>
>  employee#    startdate    startday    enddate    endday    hours    days
>  109123           07Aug07    Monday    09Aug07    Wednesday    21    3
>  109123           07Aug07    Monday    09Aug07    Wednesday    -21    -3
>  109123           07Aug07    Monday    09Aug07    Wednesday    21    3

****
gen correction = (sign(days) == -1)
replace correction = 1 if sign(hours) == -1

replace hours = abs(hours)
replace days = abs(days)

duplicates tag employee startdate startday enddate endday hours days, gen(tag)
****

The first bit is to keep track of which entries have negative values
for hours or days.
The variable tag indicates the number of duplicates. If you only have
example A and B cases left, this variable should take on the values 0,
1 and 2. You could then do something like

***
tab tag
drop if tag==1 /* this should get rid of all example A cases */
drop if tag==2 & correction == 0 /* this leaves you with one
observation if there are three identical entries, as in example B  */
***

Note that this solution cannot be applied before you have dealt with
the cases where correction occurred in multiple entries, since these
cases would show up as duplicates but unrelated to the original entry.
Maybe someone else who is more familiar with this kind of data comes
up with an idea how to find and eliminate those.

HTH,
Eva
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index