Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Merge Panel Datasets
From 
 
Phil Schumm <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: st: Merge Panel Datasets 
Date 
 
Sat, 18 Jun 2011 19:47:26 -0500 
On Jun 18, 2011, at 5:13 PM, Diana Beketova wrote:
I have a problem merging two panel datasets. In one file I have data  
about companies' balance sheet and P&L sorted by company's ID- 
number. It looks like this:
Obs  ID-number   year    Total Assets Operat. Revenue
1        123     2002        500         100
2        123     2005        505         110
3        123     2006        600         120
4        789     2001        550         340
5        789     2005        670         560
So there are missing values within the years, and ID-number repeats  
itself for a group of observations. So, I think in this case it  
can't work as an unique identifier.
The next file contains ownership information, also a panel dataset.
Obs ID-number year    Ownership % Country of origin
1     123     2002      20%         DE
2     123     2002      50%         FR
3     123     2002      30%         UK
4     123     2005      30%         DE
5     123     2005      40%         FR
6     123     2005      30%         UK
7     789     2001      50%         CN
8     789     2001      50%         US
9     789     2003      70%         CN
10    789     2003      30%         US
Here, ownership information changes over years, but it can also  
happen that panel data contains missing values in years.
Is there any possibility to merge these two files together?
That it looks like this:
Obs ID-number year Total Assets Operat. Revenue Ownership% Country  
of origin
1     123     2002     500             100          20%           DE
2     123     2002     500             100          50%           FR
3     123     2002     500             100          30%           UK
4     123     2005     505             110          30%           DE
5     123     2005     505             110          40%           FR
6     123     2005     505             110          30%           UK
7     123     2006     600             120
8     789     2001     550             340          50%           CN
9     789     2001     550             340          50%           US
10    789     2003                                  70%           CN
11    789     2003                                  30%           US
5     789     2005     670             560
Is it enough to sort the datasets by ID-number and year and then  
merge them using these two as unique identifier? In my case I get so  
little observations that match after the merge that I think that I  
am doing something wrong.
There isn't any problem with -merge- here; you did
    merge 1:m id_number year using file2
to generate the result you show above.  The question is: What do you  
want?  In the first dataset, you have what appear to be yearly assets  
and operating revenues for each firm (i.e., a maximum of one  
observation per firm per year; note that the fact that some firms  
might not have data available for a given year is irrelevant for this  
discussion).  In the second dataset, however, you have multiple  
observations per year for each firm.  I assume that this means, for  
example, that in 2002, 20% of firm 123 was owned by German investors,  
50% by French investors, and 30% by British investors?  Regardless,  
you have to figure out what you want for a result before you can  
determine the appropriate way to use -merge-, and that will probably  
depend on what analysis you are going to perform.  If your only goal  
is to combine the data in the two files into one file without any loss  
of information, then what you've done above may be adequate.  It's  
clearly not an efficient way to store the data (because of the  
duplication), but if the dataset isn't too large, that might not matter.
-- Phil
P.S. It's no longer necessary to sort your data before merging -- the - 
merge- command will now take care of that for you.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/