Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Data Manipulation Question


From   Alex Warofka <[email protected]>
To   statalist <[email protected]>
Subject   st: Data Manipulation Question
Date   Thu, 19 Sep 2013 11:24:43 -0400

Hi,

I'm trying to perform what seems like a relatively simple data
manipulation task on a very large dataset (~20GB with 20 million
observations), but having some difficulties wrapping my head around
the best way to do so using Stata.

I have four variables—individual ID (not unique because one individual
can work for multiple employers in the same period), employer EIN, and
quarter—and am attempting to flag events where >=80% of the employees
working at a given EIN in quarter 1 move to the same different EIN in
quarter 2 AND >=80% of the employees working at such an EIN in quarter
2 came from the same different EIN in quarter 1. In essence, the goal
is to flag spurious transition events where an employer appears to
change but in fact only their EIN has changed. This is the same
procedure used in building the successor-predecessor file for the QWI
and described in Census technical paper TP-2006-01.

My initial thought was to use levelsof and loop over EINs, pulling a
local macro containing the IDs of employees for each EIN, then looping
through these employees to see where they are working in Q2 and so on.
This doesn't work as I run into the 67,784 character macro length
limit. Splitting the dataset by quarter, merging, and then using
_merge to track individual movements between firms doesn't work as my
IDs are not unique.

Does anyone have any recommendations for handling this in Stata? At
this point, I'm becoming tempted to just write a Ruby script to do
this, but would be thrilled to discover it was possible in Stata.

Thanks,

-- 
Alex Warofka
Research Associate | California Center for Population Research, UCLA
[email protected] | nomad.cm | @AlexWarofka

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index