[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Preventing 'spreading' on merging files

From	Ernest Berkhout <[email protected]>
To	[email protected]
Subject	Re: st: Preventing 'spreading' on merging files
Date	Wed, 31 Mar 2004 12:39:08 +0200

At 12:28 31/03/2004, you wrote:

I have a data set with around 90,000 observations and I am merging it with
another dataset with around 3 million observations. However, when I merge the
two, the number of '3s' in the _merge variable exceeds the 90,000 in the master
file by a few thousand, indicating that there is more than one observation in
the using file for some of the observations in the working file.

Does anyone know if it is possible to prevent Stata from picking up the
additional observations from the using file (i.e., constraining the observations
merged to the 90,000 in the working file)?

Sounds like your key-variable is not unique in the using dataset, so some records in your master set match with more than one record in the using set, and therefor get duplicated.
Maybe you might want to take a look at mmerge.ado, which adresses these issues more directly then the built-in merge command.

Ernest Berkhout
SEO Amsterdam Economics
University of Amsterdam

Room 3.08
Roetersstraat 29
1018 WB Amsterdam
The Netherlands

tel.:+ 31 20 525 1657
fax:+ 31 20 525 1686
http://www.seo.nl
===========================
A statistician: someone who insists
on being certain about uncertainty
===========================

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

References:
- st: Preventing 'spreading' on merging files
  - From: "Mark Clatworthy" <[email protected]>

Prev by Date: st: Preventing 'spreading' on merging files
Next by Date: st: RE: from string to nummeric
Previous by thread: st: Preventing 'spreading' on merging files
Index(es):
- Date
- Thread