[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: -collapse- and -merge-

From   Philipp Rehm <>
Subject   Re: st: -collapse- and -merge-
Date   Mon, 05 Nov 2007 22:20:54 -0500

Thanks to Austin for his input.

Despite his very reasonable explanation, I am still not convinced. I would prefer to make merging on missing values an option (and option, I would think, almost nobody ever would use), not the default. But we are in the realm of tastes here. De gustibus non est disputandum...

Thanks again!


Austin Nichols wrote:

Philipp Rehm <>:

This seems to be the desirable outcome.  If you specify a merge
matched on a variable with missing values, you expect the missing
values to be matched.  If you specify uniqusing in your example, it
should not change the behavior since there is only one missing value
in the using file.  If you want missing values not to be merged, and
you have only one type of missing in both files, you can redefine one
or both of them so they no longer match, e.g.
 replace id=.a if id==.
or drop the obs with missing ids, but this is a choice you should
make, not -merge-.

On 11/3/07, Philipp Rehm <> wrote:
I am regularly puzzled by a particular feature of -merge-, namely to
match missing observations with each other. Here is an example:

sysuse auto, clear
sort price
keep in 1/15
replace foreign=. in 1/5

        collapse (mean) PRICE=price, by(foreign)
        sort foreign
        tempfile m
        save `m'

sort foreign
merge foreign using `m'

list foreign PRICE

I can avoid this problem in various ways (a "drop if foreign==." after
the -collapse- would be one option). I also understand that Stata reads
missing values as very large numbers (i.e.: all nonmissing numbers < . <
.a < .b < ... < .z). I do not understand, however, why it matches
missing values with each other. Moreover, the same behavior persists
when I specify the -merge- option "uniqusing".

Let me add that this behavior does not seem as strange in the example
above. However, I usually -merge- data from totally different
data-sources. There is no logical pattern to the missing values, and no
reason to match them.

Am I missing something? Clarifications are appreciated.


*   For searches and help try:

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index