How do you define group characteristics in your data in order
to create subsets?
|
Title
|
|
Efficiently defining group characteristics to create subsets
|
|
Author
|
Christopher F. Baum, Boston College
|
|
Date
|
December 2001; minor revisions July 2011
|
Say that your cross-sectional dataset contains microdata—a record for
each employee, for instance—and you want to associate each employee's
workplace with an industry code. That information is not on the record but
is available to you. How do you get this associated information (which might
also be, e.g., the code for a specific pension plan or the state) on the
record without manual editing or a long sequence of statements with
if clauses? The latter method is perhaps familiar to users of other
statistical packages, but there is a better way.
Let us presume that we have Stata dataset employee containing the
individual-specific measurements as well as wpid, the workplace ID.
Assume that it can be dealt with as an integer; if it were a string code,
that could easily be handled as well.
Create a text file containing two columns: the workplace ID (wpid)
and the industry code (indcod). For instance,
12367 321
12467 313
13211 321
... ...
23435 371
32156 341
Read the file into Stata with infile wpid indcod, sort wpid,
and save as Stata dataset wpchar.
Now use the employee file and give the commands
. sort wpid
. merge m:1 wpid using wpchar
. tab _merge
You should find that all employees now have an indcod variable
defined. If there are missing values in indcod, list the
wpids for which indcod is missing (presuming that you have
industry codes for all workplaces). When you are satisfied that the merge
has worked properly, type
. drop _merge
This is a good example of the power and flexibility of Stata’s
merge command. The merge facility does not perform just
one-to-one merges; in this example, it performs a one-to-many merge,
associating a workplace with each of the employees at that workplace. A
clear advantage of this technique appears when you have more than one
characteristic to be added to each employee record, for instance, an
industry code and the number of employees of the firm, the total sales of
the firm, etc. Any number of such firm-level variables could be added to the
records in the wpchar file and merged onto the employee file with the
same command.
Unlike an approach depending on a long list of conditional statements,
replace indcod=321 if inlist(wpid,12367,13211,...), this approach
provides a Stata dataset containing your workplace ID numbers, so that you
may easily see whether you have a particular code in your list. This
approach would be especially useful if you revise the list for a new set of
workplaces, etc.
|