Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: st: Merge Panel Datasets

From	Phil Schumm <[email protected]>
To	[email protected]
Subject	Re: AW: st: Merge Panel Datasets
Date	Sun, 19 Jun 2011 10:55:39 -0500

On Jun 19, 2011, at 1:28 AM, Diana Beketova wrote:

You describe everything right. Indeed, I have yearly total assetsand revenues in the master data and multiple observations per yearabout firm ownership structure in the using-data. I have to analyzethe impact of foreign ownership on firm value/profitabilityafterwards. The dataset is very big and also causes some problemsbecause I am limited to 1GB memory on my computer. The goal is alsoto find out which companies survived best during the crisis, whattype of foreign owner had the most positive effect on firmperformance etc. I made the merge again using 1:m-option and you cansee my results below. I just can't believe that out of almost 1.9million observations only 451 match.
merge 1:m ID_NUMBER YEAR using file2
ID_NUMBER was str15 now str16

Result                           # of obs.
-----------------------------------------
not matched                     1,894,248
from master                   387,108  (_merge==1)
from using                  1,507,140  (_merge==2)

matched                               451  (_merge==3)
-----------------------------------------
At the end I want to have a panel that contains all observationsover years, so I can run all the needed regressions etc. Do youthink it is somehow possible?

I agree that this seems odd -- the only way to find out what is goingon is to inspect some of the "master only" and "using only"observations (I would suggest doing this for a few, specific firms).Assuming you don't see any problems, then it simply means that formost firms and/or most years, you either have data on assets andrevenue *or* data on ownership, but not both. If that's true, thenyou need to think about how you're going to handle that in youranalysis (e.g., by estimating values for the interim years, or byusing a model that does not require concurrent observations on yourmeasures).

I'm guessing that you're going to need to do some work with yourownership data before you can use them in a model. For example, youmay need to compute measures like "total percent foreign ownership,""percent owned by specific countries," etc. If so, then you probablywant to do these calculations before merging.

Finally, if you're running up against memory constraints (and can'tbuy/borrow more memory or move to another machine), then make sure youare storing and using your data in the most memory-efficient waypossible. For example, eliminate any unnecessary variables (e.g., theobservation number variable in your example), and make sure you arestoring everything in the most efficient way possible (e.g., storevariables such as country code as numeric with value labels, and use -compress-). Also, as I noted in my previous email, your current mergeresult (with multiple observations per firm for each year) is veryinefficient, because information on assets and revenue for a givenyear is duplicated. If your analysis will ultimately be based on afile with only one observation per firm per year, then you shouldmodify your second file accordingly before merging it on.



-- Phil

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- AW: AW: st: Merge Panel Datasets
  - From: "Diana Beketova" <[email protected]>

References:
- st: Merge Panel Datasets
  - From: "Diana Beketova" <[email protected]>
- Re: st: Merge Panel Datasets
  - From: Phil Schumm <[email protected]>
- AW: st: Merge Panel Datasets
  - From: "Diana Beketova" <[email protected]>

Prev by Date: st: Programming question
Next by Date: st: RE: Problem with xi and xtivreg2
Previous by thread: AW: st: Merge Panel Datasets
Next by thread: AW: AW: st: Merge Panel Datasets
Index(es):
- Date
- Thread