Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: [merging US industry level data]


From   "White, Justin" <JWhite@yesvirginia.org>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: [merging US industry level data]
Date   Fri, 29 Sep 2006 09:17:15 -0400

In this case, this is what I would do:

Create a variable in the Master that would be: State+Year
(AL1970,....,AL2000)

Create the same variable in the Using: State+Year

Be sure to sort each data set by the new variable.

Then merge the two data sets using this new variable.

This should work.  You are correct if you were to append the data, it
would just attach the using data to the Master data.  This data is
structurally different than the NAICS data we were discussing earlier.

Hope this helps.


Justin White

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Scott
Cunningham
Sent: Friday, September 29, 2006 9:08 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: [merging US industry level data]

> I would not use a merge.  Merge requires there be a common variable
> between datasets.  I use this same data a lot.  Since 3-digit and
> 4-digit NAICS data are not "similar", I would append the 3-digit  
> data to
> the 4-digit data and create a dummy variable indicating whether the
> observation is associated with 3-digit or 4-digit.

Justin,

Not to butt in, but can you elaborate?  I have two datasets  
currently:  one on health outcomes where the panel identifier is a  
state variable which varies over time, and another dataset (the  
master dataset) where the panel identifier is a state, race, age, and  
sex specific cell that varies over time.  The health data is from  
1980-2000, while the master dataset is from 1970-2000.  Originally I  
was using -joinby- but it was causing the master data to drop the  
1970-1979 years.  So I was going back to -merge-, and had planned to - 
reshape- the data down to a level where the merge could occur between  
using and master datasets.  But are you saying here that -append-  
might be better, where a dummy variable indicating the using from the  
master data?  But won't this just extend the length of the master  
data?  For instance, say the data is:

MASTER

race	sex		age		state	year 	vbl1
Black	M		15		AL		1970	14.4
Black	F		15		AL		1970	4.4
White	M		15		AL		1970 	.03
White	F		15		AL		1970	3.3

...

Black	M		15		AL		2000	1.2
Black	F		15		AL		2000	11
White	M		15		AL		2000	.91
White	F		15		AL		2000	12.1


USING

state 	year	vbl2
AL		1980	11
AL		1981	12
...
AL		2000	14.5


My thought was to reshape vbl1 by sex, age and race, as I was saying,  
so as to create a single state observation, and then merging on that  
state using the using data.  But are you saying that it's easier to  
use append?  Wouldn't it just add teh data to bottom of the master data?


scott
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index