I would not use a merge. Merge requires there be a common variable
between datasets. I use this same data a lot. Since 3-digit and
4-digit NAICS data are not "similar", I would append the 3-digit data to
the 4-digit data and create a dummy variable indicating whether the
observation is associated with 3-digit or 4-digit.
Hope this helps.
Justin White
-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Philipp Rehm
Sent: Friday, September 29, 2006 8:41 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: [merging US industry level data]
You don't give a whole lot of information about your data-set, but there
are a few things that can be said.
1) You need to generate the same industry level variable in both
data-sets, i.e. you need to generate a 3-digit level industry code
inside the data-set with the 4-level data-set (let's call this data-set
the 'master' data-set).
It is not clear how the 4-digit and the 3-digit industry variables
relate to each other, but let's assume that you can simply cut off the
last digit of the 4-digit variable to derive the 3-digit variable (e.g.,
codes 1230 to 1239 at the 4-digit level correspond with code 123 at the
3-digit level.
Assuming this, as well as that your 4-digit level industry variable is
coded in integers (and called industry_4d), you could get the 3-digit
level variable with something like this:
gen int industry_3d = real(substr(string(industry_4d),1,3))
In your other data-set, you also need to have a variable that is called
"industry_3d" (and you need to make sure that it is equivalently coded,
of course - which I assumed above).
2) Depending on what type of merge you want to do, you probably need to
sort both data-sets by the identifier variables (the variables you want
to merge on). Assuming you want to merge on, say, "year" and
"industry_3d", you would need to sort both data-sets by "year
industry_3d."
3) The you can merge, along the following lines:
use master.dta, clear
merge year industry_3d using using.dta
(where the data-set with the original 3-digit level industry level
variable is called "using.dta").
HTH,
Philipp
Rohit wrote:
> hi there,
> mine is a very preliminary question. i am working with the US industry
level
> data and i want to merge the variables of 4-digit level industries to
3-digit
> and also create a variable for 3-digit.
> could anybody help me with that?
> thanks
> rohit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/