  | 
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: re: complex data cleaning issue (well, complex for me)
Stephen said
For each employee, I have employee number (e.g., 109123) the start  
date (e.g., 07Aug07), start day (e.g., Monday) , end date (e.g.,  
09Aug07), end day (e.g., Wednesday), number of hours (e.g., 21), and  
number of days (e.g., 3). There is other info as well. Each case of  
absenteeism is on one line, with some employees being represented on  
one line only (only one recorded absenteeism entry), and other  
employees with multiple records of absenteeism, for example being  
absent on several different days across the year.
Unfortunately, for many records, there are then 'corrections'. (A  
correction could occur for several reasons,  such as that entry no  
longer being regarded as absenteeism but sick leave). A correction  
shows up as a re-entry of that data with negative values for hours  
and days. Below is an example of the original entry, plus a correction.
(snip)
I don't see anything complex here. If you -collapse- on employee#,  
start date and end date, computing sums of the hours and days field,  
everything should work properly.
In Example A, you will get hours = days = 0, and you can screen out  
records that have hours = days = 0
In Example B, the first two records will negate each other, and you  
will be left with the third.
In Example C, you will again be left with hours = days = 0.
I don't think you need any -duplicates- logic to handle this.
You also don't want to do any reshaping. You can usie mofd() to  
generate a month variable, and then just -collapse- on that variable,  
generating sums, to make what you have into a monthly time series per  
employee. Just two applications of -collapse-. In fact you could  
probably get by with one.
One possible wrinkle: the start and end dates might span a month-end,  
so you might need to be somewhat canny about that in terms of  
classifying each record as belonging to a particular month.
Kit
Kit Baum, Boston College Economics and DIW Berlin
http://ideas.repec.org/e/pba1.html
An Introduction to Modern Econometrics Using Stata:
http://www.stata-press.com/books/imeus.html
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/