Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: [merging US industry level data]

From   "White, Justin" <[email protected]>
To   <[email protected]>
Subject   RE: st: [merging US industry level data]
Date   Fri, 29 Sep 2006 08:45:50 -0400

I would not use a merge.  Merge requires there be a common variable
between datasets.  I use this same data a lot.  Since 3-digit and
4-digit NAICS data are not "similar", I would append the 3-digit data to
the 4-digit data and create a dummy variable indicating whether the
observation is associated with 3-digit or 4-digit. 

Hope this helps.

Justin White

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Philipp Rehm
Sent: Friday, September 29, 2006 8:41 AM
To: [email protected]
Subject: Re: st: [merging US industry level data]

You don't give a whole lot of information about your data-set, but there

are a few things that can be said.

1) You need to generate the same industry level variable in both 
data-sets, i.e. you need to generate a 3-digit level industry code 
inside the data-set with the 4-level data-set (let's call this data-set 
the 'master' data-set).
It is not clear how the 4-digit and the 3-digit industry variables 
relate to each other, but let's assume that you can simply cut off the 
last digit of the 4-digit variable to derive the 3-digit variable (e.g.,

codes 1230 to 1239 at the 4-digit level correspond with code 123 at the 
3-digit level.

Assuming this, as well as that your 4-digit level industry variable is 
coded in integers (and called industry_4d), you could get the 3-digit 
level variable with something like this:

gen int industry_3d = real(substr(string(industry_4d),1,3))

In your other data-set, you also need to have a variable that is called 
"industry_3d" (and you need to make sure that it is equivalently coded, 
of course - which I assumed above).

2) Depending on what type of merge you want to do, you probably need to 
sort both data-sets by the identifier variables (the variables you want 
to merge on). Assuming you want to merge on, say, "year" and 
"industry_3d", you would need to sort both data-sets by "year

3) The you can merge, along the following lines:
use master.dta, clear
merge year industry_3d using using.dta

(where the data-set with the original 3-digit level industry level 
variable is called "using.dta").


Rohit wrote:
> hi there,
> mine is a very preliminary question. i am working with the US industry
> data and i want to merge the variables of 4-digit level industries to
> and also create a variable for 3-digit.
> could anybody help me with that?
> thanks
> rohit
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index