Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Merge Panel Datasets

From	Phil Schumm <[email protected]>
To	[email protected]
Subject	Re: st: Merge Panel Datasets
Date	Mon, 20 Jun 2011 10:39:11 -0500

On Jun 19, 2011, at 8:41 PM, Diana Beketova wrote:

This is totally true that I first had to create 'total foreignownership' and 'total domestic ownership' in order to make oneobservation line out of many. But I first wanted just to try tomerge both data files, so I can see if this merge can be successfulat all and where are my week points to work on.



Seems reasonable, though note that you could also do this with

    merge 1:m ID_NUMBER YEAR using file2, keepusing(ID_NUMBER YEAR)

(i.e., ignore for now the rest of the variables in the second file)which would cut down on your memory usage.

I had an idea about building year clusters because I have a range ofyears 2002-2010. So I can build 3x3 year clusters: 2002-2004,2005-2007, 2008-2010. Within each of these years I can generate newvariables for Total Assets and Oper. Revenue that will be averagesof Total Assets and Oper. Revenue within this cluster. Becauseownership is so oddly distributed, there is a high probability thatthere will be only one observation per year cluster. At the end Iwould use Heckman correction method in order to correct forselection bias. Or also Tobit-model for censored variables. Do youthink, this methodology could be reasonable to use? Otherwise, Idon’t know how to match these to files. I have to say that datacomes from an emerging market and is very biased and incomplete.Maybe you know further ways how to deal with the bias problem?

I don't see how your "cluster" strategy is related to the use of aselection model (e.g., Heckman) or censored regression model (e.g.,Tobit). Moreover, I know absolutely nothing about this substantivearea, so I cannot comment intelligently on your strategy. Groupingthree years together may affect your results (e.g., it will smooth outyear-to-year changes), so at a minimum, you would need to do asensitivity analysis to see how your choice of endpoints (includingsize of "cluster") affects things. Of perhaps less importance, youmight also want to take account of the fact that a mean of three yearshas different properties than a mean of only one year (if the data forthe other two years are missing).

Of critical importance before proceeding with any strategy is to havea good understanding of why the missing data are missing, and to thinkabout what effects this might have on your results (even if you don'texplicitly take account of this in your analysis).



-- Phil


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Merge Panel Datasets
  - From: "Diana Beketova" <[email protected]>
- Re: st: Merge Panel Datasets
  - From: Phil Schumm <[email protected]>
- AW: st: Merge Panel Datasets
  - From: "Diana Beketova" <[email protected]>
- Re: AW: st: Merge Panel Datasets
  - From: Phil Schumm <[email protected]>
- AW: AW: st: Merge Panel Datasets
  - From: "Diana Beketova" <[email protected]>

Prev by Date: Re: st: spmat: banded matrix from friendship list
Next by Date: st: RE: HHI
Previous by thread: AW: AW: st: Merge Panel Datasets
Next by thread: st: Problem with xi and xtivreg2
Index(es):
- Date
- Thread