Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Obtaining details about -merge, update-


From   Phil Clayton <[email protected]>
To   [email protected]
Subject   Re: st: Obtaining details about -merge, update-
Date   Sat, 19 Oct 2013 11:50:10 +1100

Oh sorry, I see now.

Another option that would require more time but less memory than Joe's brute force approach would be to merge in the variables one by one (with the keepusing() option), and then after each merge rename _merge varname_merge. This would only require one byte per variable merged, but for large datasets might be too slow to be practical.

Phil

------------------------------------
* create a fake auto dataset with some new info for foreign & mpg
sysuse auto, clear
replace foreign =rbinomial(1, 0.5)
replace mpg=rnormal(21, 6)
keep make foreign mpg 
tempfile using
save "`using'"

* load the original auto dataset and merge in the new foreign & mpg variables
sysuse auto, clear
foreach var of varlist foreign mpg {
	merge 1:1 make using "`using'", keepusing(`var') replace update
	rename _merge `var'_merge
}
tab foreign_merge
tab mpg_merge
------------------------------------

On 19/10/2013, at 9:57 AM, Sergiy Radyakin <[email protected]> wrote:

> Phil, not exactly. Variable _merge tells me whether the whole
> observation was matched, coming from the 'original' or 'using' data.
> It's about variable-level updates. Something like:
> 
> varname    updated    replaced      original        total
> ----------------------------------------------------------------------
> age            12             20               5028            5060
> lastname    18             2                5040             5060
> ....
> 100 more vars or so depending on the data
> .....
> ----------------------------------------------------------------------
> 
> Best, Sergiy Radyakin
> 
> 
> On Fri, Oct 18, 2013 at 6:29 PM, Phil Clayton
> <[email protected]> wrote:
>> Perhaps I don't understand but isn't this what the _merge variable tells you?
>> 
>> Phil
>> 
>> On 19/10/2013, at 7:09, Joe Canner <[email protected]> wrote:
>> 
>>> Dear Colleagues,
>>> 
>>> Is there a relatively simple way to find out exactly what happened in the course of a -merge, update- command?  In other words: I have two datasets with a number of overlapping variables and I want to find out how often, for each variable, a missing observation in the master was updated with a non-missing observation in the using dataset.  Likewise, how often were observations in the master no updated because of a non-missing conflict.  Basically, this would be similar to the current merge results table, but on a variable-by-variable basis rather than based on the dataset as a whole.
>>> 
>>> Of course, this same functionality would be useful for -merge, replace-, although that is not my present concern.
>>> 
>>> If the answer is "no", is this something that people would be interested in?
>>> 
>>> Regards,
>>> Joe Canner
>>> Johns Hopkins University School of Medicine
>>> 
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index