[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: merge question

From	Kit Baum <[email protected]>
To	[email protected]
Subject	st: merge question
Date	Wed, 30 Aug 2006 21:55:45 -0400

Amadou has a randomness problem:

I am trying to merge 2 datasets.
But everytime, I get different results
(_m==3 has 83 observations in the
first time, 97 in the second, 100 in the
third and 96 in the fourth, and so on).
I tried to set seed and made my sort, stable.
With no success. I also tried to recast double
my merging identifier. No success. I tried to
tostring it. No success either.
Any hints why I obtain these various results?
I verified in both Stata and Excel.
I do not understand why Stata marked 3 to some
observations that belonged to both datasets in the
first trial and not in the second time.
Best regards.
Amadou.

PS: When I work interractivelly, I do not have that problem.
I have 96 observations that matched. So what I am doing
wrong in my stata do file?

This is almost surely the result of a many-to-many merge which will create exactly what he finds: a do-file that, when rerun, yields different results (in terms of the number of obs.) every time.

Use the unique, uniqmaster or uniqusing options on merge, whichever is appropriate. The merge key should be unique in one file or the other, if not both.

"The dangers of many-to-many merges", p. 58 of the book cited below.

Kit Baum, Boston College Economics
http://ideas.repec.org/e/pba1.html
An Introduction to Modern Econometrics Using Stata:
http://www.stata-press.com/books/imeus.html

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: Asymptotic variance adjustment for OLS with generated regressors
Next by Date: st: RE: xtfisher -- too many values error
Previous by thread: st: Asymptotic variance adjustment for OLS with generated regressors
Next by thread: st: Program and syntax error "weights not allowed"
Index(es):
- Date
- Thread