Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: AW: st: Merge Panel Datasets

From   "Diana Beketova" <>
To   <>
Subject   AW: AW: st: Merge Panel Datasets
Date   Mon, 20 Jun 2011 03:41:58 +0200

Hello Phil, 

thank you very much for your time and an extended answer to my question. I
have to apologize, if maybe my problems seem 'beginner-level' for you, but I
am using Stata seriously since quite a short time now and am trying to catch
up quickly in order to be on your level :-) 

This is totally true that I first had to create 'total foreign ownership'
and 'total domestic ownership' in order to make one observation line out of
many. But I first wanted just to try to merge both data files, so I can see
if this merge can be successful at all and where are my week points to work

I had an idea about building year clusters because I have a range of years
2002-2010. So I can build 3x3 year clusters: 2002-2004, 2005-2007,
2008-2010. Within each of these years I can generate new variables for Total
Assets and Oper. Revenue that will be averages of Total Assets and Oper.
Revenue within this cluster. Because ownership is so oddly distributed,
there is a high probability that there will be only one observation per year
cluster. At the end I would use Heckman correction method in order to
correct for selection bias. Or also Tobit-model for censored variables. Do
you think, this methodology could be reasonable to use? Otherwise, I don?t
know how to match these to files. I have to say that data comes from an
emerging market and is very biased and incomplete. Maybe you know further
ways how to deal with the bias problem?

Best wishes, 


-----Ursprüngliche Nachricht-----
[] Im Auftrag von Phil Schumm
Gesendet: Sunday, June 19, 2011 5:56 PM
Betreff: Re: AW: st: Merge Panel Datasets

On Jun 19, 2011, at 1:28 AM, Diana Beketova wrote:
> You describe everything right. Indeed, I have yearly total assets  
> and revenues in the master data and multiple observations per year  
> about firm ownership structure in the using-data. I have to analyze  
> the impact of foreign ownership on firm value/profitability  
> afterwards. The dataset is very big and also causes some problems  
> because I am limited to 1GB memory on my computer. The goal is also  
> to find out which companies survived best during the crisis, what  
> type of foreign owner had the most positive effect on firm  
> performance etc. I made the merge again using 1:m-option and you can  
> see my results below. I just can't believe that out of almost 1.9  
> million observations only 451 match.
> merge 1:m ID_NUMBER YEAR using file2
> ID_NUMBER was str15 now str16
> Result                           # of obs.
> -----------------------------------------
> not matched                     1,894,248
> from master                   387,108  (_merge==1)
> from using                  1,507,140  (_merge==2)
> matched                               451  (_merge==3)
> -----------------------------------------
> At the end I want to have a panel that contains all observations  
> over years, so I can run all the needed regressions etc. Do you  
> think it is somehow possible?

I agree that this seems odd -- the only way to find out what is going  
on is to inspect some of the "master only" and "using only"  
observations (I would suggest doing this for a few, specific firms).   
Assuming you don't see any problems, then it simply means that for  
most firms and/or most years, you either have data on assets and  
revenue *or* data on ownership, but not both.  If that's true, then  
you need to think about how you're going to handle that in your  
analysis (e.g., by estimating values for the interim years, or by  
using a model that does not require concurrent observations on your  

I'm guessing that you're going to need to do some work with your  
ownership data before you can use them in a model.  For example, you  
may need to compute measures like "total percent foreign ownership,"  
"percent owned by specific countries," etc.  If so, then you probably  
want to do these calculations before merging.

Finally, if you're running up against memory constraints (and can't  
buy/borrow more memory or move to another machine), then make sure you  
are storing and using your data in the most memory-efficient way  
possible.  For example, eliminate any unnecessary variables (e.g., the  
observation number variable in your example), and make sure you are  
storing everything in the most efficient way possible (e.g., store  
variables such as country code as numeric with value labels, and use - 
compress-).  Also, as I noted in my previous email, your current merge  
result (with multiple observations per firm for each year) is very  
inefficient, because information on assets and revenue for a given  
year is duplicated.  If your analysis will ultimately be based on a  
file with only one observation per firm per year, then you should  
modify your second file accordingly before merging it on.

-- Phil

*   For searches and help try:
eMail ist virenfrei.
Von AVG überprüft -
Version: 10.0.1382 / Virendatenbank: 1513/3712 - Ausgabedatum: 18.06.2011 

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index