[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Julian Fennema" <j.a.fennema@hw.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: Merge problem |

Date |
Tue, 29 Apr 2003 14:32:00 +0100 |

Dear all, Apologies for the long-winded explanation that follows, but I am trying to be as precise as possible about the problem that I am having. I am constructing a panel from 44 files; 22 of production statistics and 22 of investment. The time dimension runs from 1978 to 1999. As of 1993 there was a change in the unique identifier for the firm, from a two-part system (r1 and r2) to a single identifier variable (b1). I have a 45th file which contains information on the change in identifier for all firms operating in 1993 - "link.dta" conatins three variables; r1 r2 b1. For all time periods, in both types of dataset, the relevant identifier is removed from a long string and destrung (or converted to a number if you want to avoid excessive butchery of the language). The code for doing this is exactly the same for both the investment and the production statistics, there are no duplicates, the format is the same for both types; r1 is a 9-digit code, r2 has 3 digits and b1 8. For periods prior to 1993 I issue a merge command -merge r1 r2 using "link.dta" The expected result is that for firms still operating in 1993 an extra variable is created in the dataset, containing the value of b1. As you would expect attrition means that not all firm operating in 1978 or so are still operating in 1993, so don't get linked to a value for b1 (_merge==1). Equally the link datafile contains a large amount of unnecessary information so a large proportion come up as _merge==2. What I don't understand is some of the _merge==3 observations. For the investment data, everything is fine. The majority of r1/r2 combinations are linked to a value of b1, which is added to the dataset. However, for the production statistics something weird is going on. Again the majority of combinations of r1/r2 are linked to b1 (_merge==3), and a new variable b1 is added. Yet this variable is always coded as missing (.). So what seems to be occurring is that the r1/r2 combination is correctly linked to a value of b1, but this value is not added correctly. I have checked the issue of format; the _merge==2 observations are added neatly at the end of the file, with r1 in the correct column, r2 and b1 also. So there seems to be no issue of formatting (just checking). The values of r1 and r2 are present in link.dta, and have an associated value of b1 (as suggested by _merge==3). It is possible to make the r1/r2 connection for link-investment and for production-investment, but not link-production. Could anyone give me a pointer as to what I am doing wrong? Thanks for reading this far, and also in advance for any help. Julian P.S. An illustration of what I mean (with fewer digits). Please note that the number of observations is not the same in the investment and production datasets. Link.dta Obs. R1 r2 b1 1 100 001 123 2 101 001 124 3 102 001 125 4 102 002 126 5 103 001 127 6 104 001 128 7 105 001 129 8 106 001 130 Investment.dta Obs. R1 r2 other vars 1 100 001 2 101 001 3 102 001 4 102 002 5 104 001 6 107 001 Production.dta Obs. R1 r2 other vars 1 100 001 2 101 001 3 102 001 4 102 002 5 103 001 6 104 001 7 107 001 8 108 001 -------*After a merge with link*------------ Investment.dta Obs. R1 r2 b1 _merge other vars 1 100 001 123 3 2 101 001 124 3 3 102 001 125 3 4 102 002 126 3 5 104 001 128 3 6 105 001 129 2 7 106 001 130 2 8 107 001 . 1 Production.dta Obs. R1 r2 b1 _merge other vars 1 100 001 . 3 2 101 001 . 3 3 102 001 . 3 4 102 002 . 3 5 104 001 . 3 6 105 001 129 2 7 106 001 130 2 8 107 001 . 1 9 108 001 . 1 * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: Re: Speed comparisons (Was: Suggestions?)** - Next by Date:
**Re: st: Merge problem** - Previous by thread:
**st: Hausman test for Independence from irrelevant alternatives** - Next by thread:
**Re: st: Merge problem** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |