Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Slight problem with merge documentation / merge suggestions


From   Fred Wolfe <[email protected]>
To   [email protected]
Subject   st: Slight problem with merge documentation / merge suggestions
Date   Mon, 20 Jan 2003 11:04:32 -0600

I have a file that contains duplicates in that 6 id variables are missing.

I merge a master file with the data set with missings

. merge patkey newenc using ${sql}demo1, nokeep keep(ulcermd) unique

There is no problem

I then do it but remove the -nokeep- option

. merge patkey newenc using ${sql}demo1, keep(ulcermd) unique
variables patkey newenc do not uniquely identify observations in the using data
r(459);

What Stata is doing is not to see the duplicates in the using set if the -nokeep- option results in their not becoming a part of the merged data set.

This is very reasonable behavior, but one could not predict this in advance from the documentation which reads:

"unique specifies that the match variable(s) uniquely identify the observations in the master data and in the using data."

As this could cause some problems with users who might not understand the fine technicality, perhaps another line could be added to the help file to explain that duplicates excluded by the -nokeep- option do not trigger the errors message generated by the unique options.

merge suggestion

Stata's new merge program is a major improvement and fixed the most important problem addressed by Jeroen Weesie's -mmerge- series. Jeroen's program also automatically dropped _merge. I understand that Stata doesn't want to do that (though they drop xi _var), but would it be possible to add a simple option to -merge- to drop _merge & to ignore it if it exists in the using data set?

Jeroen's program provided a friendly warning when you merged non-unique observations. Stata's merge give the warning and generates an error if the -unique- options are set. But suppose the unique options are not set. It would still be very helpful to have the message "variables keyvar(s) do not uniquely identify observations in the using data" displayed even when the merge program does not generate an error message, either automatically or as a -message- option. Seems like a simple addition.

Fred Wolfe




---------------------------------------------------------------------------- ------------------------
Fred Wolfe Tel (316) 263-2125
National Data Bank for Rheumatic Diseases Fax (316) 263-0761
Wichita, Kansas [email protected]
---------------------------------------------------------------------------- -------------------------

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index