Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Merge Panel Datasets


From   Phil Schumm <pschumm@uchicago.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Merge Panel Datasets
Date   Sat, 18 Jun 2011 19:47:26 -0500

On Jun 18, 2011, at 5:13 PM, Diana Beketova wrote:
I have a problem merging two panel datasets. In one file I have data about companies' balance sheet and P&L sorted by company's ID- number. It looks like this:

Obs  ID-number   year    Total Assets Operat. Revenue
1        123     2002        500         100
2        123     2005        505         110
3        123     2006        600         120
4        789     2001        550         340
5        789     2005        670         560

So there are missing values within the years, and ID-number repeats itself for a group of observations. So, I think in this case it can't work as an unique identifier.

The next file contains ownership information, also a panel dataset.

Obs ID-number year    Ownership % Country of origin
1     123     2002      20%         DE
2     123     2002      50%         FR
3     123     2002      30%         UK
4     123     2005      30%         DE
5     123     2005      40%         FR
6     123     2005      30%         UK
7     789     2001      50%         CN
8     789     2001      50%         US
9     789     2003      70%         CN
10    789     2003      30%         US

Here, ownership information changes over years, but it can also happen that panel data contains missing values in years.

Is there any possibility to merge these two files together?

That it looks like this:

Obs ID-number year Total Assets Operat. Revenue Ownership% Country of origin
1     123     2002     500             100          20%           DE
2     123     2002     500             100          50%           FR
3     123     2002     500             100          30%           UK
4     123     2005     505             110          30%           DE
5     123     2005     505             110          40%           FR
6     123     2005     505             110          30%           UK
7     123     2006     600             120
8     789     2001     550             340          50%           CN
9     789     2001     550             340          50%           US
10    789     2003                                  70%           CN
11    789     2003                                  30%           US
5     789     2005     670             560

Is it enough to sort the datasets by ID-number and year and then merge them using these two as unique identifier? In my case I get so little observations that match after the merge that I think that I am doing something wrong.


There isn't any problem with -merge- here; you did

    merge 1:m id_number year using file2

to generate the result you show above. The question is: What do you want? In the first dataset, you have what appear to be yearly assets and operating revenues for each firm (i.e., a maximum of one observation per firm per year; note that the fact that some firms might not have data available for a given year is irrelevant for this discussion). In the second dataset, however, you have multiple observations per year for each firm. I assume that this means, for example, that in 2002, 20% of firm 123 was owned by German investors, 50% by French investors, and 30% by British investors? Regardless, you have to figure out what you want for a result before you can determine the appropriate way to use -merge-, and that will probably depend on what analysis you are going to perform. If your only goal is to combine the data in the two files into one file without any loss of information, then what you've done above may be adequate. It's clearly not an efficient way to store the data (because of the duplication), but if the dataset isn't too large, that might not matter.


-- Phil

P.S. It's no longer necessary to sort your data before merging -- the - merge- command will now take care of that for you.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index