Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: st: Merge Panel Datasets


From   Phil Schumm <pschumm@uchicago.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: AW: st: Merge Panel Datasets
Date   Sun, 19 Jun 2011 10:55:39 -0500

On Jun 19, 2011, at 1:28 AM, Diana Beketova wrote:
You describe everything right. Indeed, I have yearly total assets and revenues in the master data and multiple observations per year about firm ownership structure in the using-data. I have to analyze the impact of foreign ownership on firm value/profitability afterwards. The dataset is very big and also causes some problems because I am limited to 1GB memory on my computer. The goal is also to find out which companies survived best during the crisis, what type of foreign owner had the most positive effect on firm performance etc. I made the merge again using 1:m-option and you can see my results below. I just can't believe that out of almost 1.9 million observations only 451 match.

merge 1:m ID_NUMBER YEAR using file2
ID_NUMBER was str15 now str16

Result                           # of obs.
-----------------------------------------
not matched                     1,894,248
from master                   387,108  (_merge==1)
from using                  1,507,140  (_merge==2)

matched                               451  (_merge==3)
-----------------------------------------

At the end I want to have a panel that contains all observations over years, so I can run all the needed regressions etc. Do you think it is somehow possible?


I agree that this seems odd -- the only way to find out what is going on is to inspect some of the "master only" and "using only" observations (I would suggest doing this for a few, specific firms). Assuming you don't see any problems, then it simply means that for most firms and/or most years, you either have data on assets and revenue *or* data on ownership, but not both. If that's true, then you need to think about how you're going to handle that in your analysis (e.g., by estimating values for the interim years, or by using a model that does not require concurrent observations on your measures).

I'm guessing that you're going to need to do some work with your ownership data before you can use them in a model. For example, you may need to compute measures like "total percent foreign ownership," "percent owned by specific countries," etc. If so, then you probably want to do these calculations before merging.

Finally, if you're running up against memory constraints (and can't buy/borrow more memory or move to another machine), then make sure you are storing and using your data in the most memory-efficient way possible. For example, eliminate any unnecessary variables (e.g., the observation number variable in your example), and make sure you are storing everything in the most efficient way possible (e.g., store variables such as country code as numeric with value labels, and use - compress-). Also, as I noted in my previous email, your current merge result (with multiple observations per firm for each year) is very inefficient, because information on assets and revenue for a given year is duplicated. If your analysis will ultimately be based on a file with only one observation per firm per year, then you should modify your second file accordingly before merging it on.


-- Phil

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index