Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: Merge Panel Datasets


From   "Diana Beketova" <diana.beketova@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   AW: st: Merge Panel Datasets
Date   Sun, 19 Jun 2011 08:28:32 +0200

Thank you Phil!

You describe everything right. Indeed, I have yearly total assets and
revenues in the master data and multiple observations per year about firm
ownership structure in the using-data. I have to analyze the impact of
foreign ownership on firm value/profitability afterwards. The dataset is
very big and also causes some problems because I am limited to 1GB memory on
my computer. The goal is also to find out which companies survived best
during the crisis, what type of foreign owner had the most positive effect
on firm performance etc. I made the merge again using 1:m-option and you can
see my results below. I just can't believe that out of almost 1.9 million
observations only 451 match. 

merge 1:m ID_NUMBER YEAR using file2
ID_NUMBER was str15 now str16

Result                           # of obs.
-----------------------------------------
not matched                     1,894,248
from master                   387,108  (_merge==1)
from using                  1,507,140  (_merge==2)

matched                               451  (_merge==3)
-----------------------------------------

At the end I want to have a panel that contains all observations over years,
so I can run all the needed regressions etc. Do you think it is somehow
possible? 

Diana 


-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Phil Schumm
Gesendet: Sunday, June 19, 2011 2:47 AM
An: statalist@hsphsun2.harvard.edu
Betreff: Re: st: Merge Panel Datasets

On Jun 18, 2011, at 5:13 PM, Diana Beketova wrote:
> I have a problem merging two panel datasets. In one file I have data  
> about companies' balance sheet and P&L sorted by company's ID- 
> number. It looks like this:
>
> Obs  ID-number   year    Total Assets Operat. Revenue
> 1        123     2002        500         100
> 2        123     2005        505         110
> 3        123     2006        600         120
> 4        789     2001        550         340
> 5        789     2005        670         560
>
> So there are missing values within the years, and ID-number repeats  
> itself for a group of observations. So, I think in this case it  
> can't work as an unique identifier.
>
> The next file contains ownership information, also a panel dataset.
>
> Obs ID-number year    Ownership % Country of origin
> 1     123     2002      20%         DE
> 2     123     2002      50%         FR
> 3     123     2002      30%         UK
> 4     123     2005      30%         DE
> 5     123     2005      40%         FR
> 6     123     2005      30%         UK
> 7     789     2001      50%         CN
> 8     789     2001      50%         US
> 9     789     2003      70%         CN
> 10    789     2003      30%         US
>
> Here, ownership information changes over years, but it can also  
> happen that panel data contains missing values in years.
>
> Is there any possibility to merge these two files together?
>
> That it looks like this:
>
> Obs ID-number year Total Assets Operat. Revenue Ownership% Country  
> of origin
> 1     123     2002     500             100          20%           DE
> 2     123     2002     500             100          50%           FR
> 3     123     2002     500             100          30%           UK
> 4     123     2005     505             110          30%           DE
> 5     123     2005     505             110          40%           FR
> 6     123     2005     505             110          30%           UK
> 7     123     2006     600             120
> 8     789     2001     550             340          50%           CN
> 9     789     2001     550             340          50%           US
> 10    789     2003                                  70%           CN
> 11    789     2003                                  30%           US
> 5     789     2005     670             560
>
> Is it enough to sort the datasets by ID-number and year and then  
> merge them using these two as unique identifier? In my case I get so  
> little observations that match after the merge that I think that I  
> am doing something wrong.


There isn't any problem with -merge- here; you did

     merge 1:m id_number year using file2

to generate the result you show above.  The question is: What do you  
want?  In the first dataset, you have what appear to be yearly assets  
and operating revenues for each firm (i.e., a maximum of one  
observation per firm per year; note that the fact that some firms  
might not have data available for a given year is irrelevant for this  
discussion).  In the second dataset, however, you have multiple  
observations per year for each firm.  I assume that this means, for  
example, that in 2002, 20% of firm 123 was owned by German investors,  
50% by French investors, and 30% by British investors?  Regardless,  
you have to figure out what you want for a result before you can  
determine the appropriate way to use -merge-, and that will probably  
depend on what analysis you are going to perform.  If your only goal  
is to combine the data in the two files into one file without any loss  
of information, then what you've done above may be adequate.  It's  
clearly not an efficient way to store the data (because of the  
duplication), but if the dataset isn't too large, that might not matter.


-- Phil

P.S. It's no longer necessary to sort your data before merging -- the - 
merge- command will now take care of that for you.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
-----
eMail ist virenfrei.
Von AVG überprüft - www.avg.de
Version: 10.0.1382 / Virendatenbank: 1513/3711 - Ausgabedatum: 18.06.2011 


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index