Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Obtaining details about -merge, update-


From   Joe Canner <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: Obtaining details about -merge, update-
Date   Sat, 19 Oct 2013 13:03:02 +0000

Robert,

Thanks for the suggestion. I did think of -cf- but when the using dataset has records that aren't in the master (_merge==2) the resulting dataset has more observations than the master and -cf- chokes on that before getting a chance to tell you about differences in individual variables. Otherwise, -cf- is a pretty good solution when the merged and master have the same number of records.

Regards,
Joe
________________________________________
From: [email protected] [[email protected]] on behalf of Robert Picard [[email protected]]
Sent: Saturday, October 19, 2013 5:30 AM
To: [email protected]
Subject: Re: st: Obtaining details about -merge, update-

I think that -cf- can be used to find out how many times each variable
has been updated.

* -------------- begin example ------------------
sysuse auto, clear
isid make, sort
tempfile new
save "`new'"

replace price = . if price > 5000
replace rep78 = . if rep78 < 3
tempfile old
save "`old'"

merge 1:1 make using "`new'", update

* if you do not want -cf- to stop execution...
cap noi cf _all using "`old'"

* if you don't mind the error...
cf _all using "`old'"
* -------------- end example --------------------

On Sat, Oct 19, 2013 at 2:50 AM, Phil Clayton
<[email protected]> wrote:
> Oh sorry, I see now.
>
> Another option that would require more time but less memory than Joe's brute force approach would be to merge in the variables one by one (with the keepusing() option), and then after each merge rename _merge varname_merge. This would only require one byte per variable merged, but for large datasets might be too slow to be practical.
>
> Phil
>
> ------------------------------------
> * create a fake auto dataset with some new info for foreign & mpg
> sysuse auto, clear
> replace foreign =rbinomial(1, 0.5)
> replace mpg=rnormal(21, 6)
> keep make foreign mpg
> tempfile using
> save "`using'"
>
> * load the original auto dataset and merge in the new foreign & mpg variables
> sysuse auto, clear
> foreach var of varlist foreign mpg {
>         merge 1:1 make using "`using'", keepusing(`var') replace update
>         rename _merge `var'_merge
> }
> tab foreign_merge
> tab mpg_merge
> ------------------------------------
>
> On 19/10/2013, at 9:57 AM, Sergiy Radyakin <[email protected]> wrote:
>
>> Phil, not exactly. Variable _merge tells me whether the whole
>> observation was matched, coming from the 'original' or 'using' data.
>> It's about variable-level updates. Something like:
>>
>> varname    updated    replaced      original        total
>> ----------------------------------------------------------------------
>> age            12             20               5028            5060
>> lastname    18             2                5040             5060
>> ....
>> 100 more vars or so depending on the data
>> .....
>> ----------------------------------------------------------------------
>>
>> Best, Sergiy Radyakin
>>
>>
>> On Fri, Oct 18, 2013 at 6:29 PM, Phil Clayton
>> <[email protected]> wrote:
>>> Perhaps I don't understand but isn't this what the _merge variable tells you?
>>>
>>> Phil
>>>
>>> On 19/10/2013, at 7:09, Joe Canner <[email protected]> wrote:
>>>
>>>> Dear Colleagues,
>>>>
>>>> Is there a relatively simple way to find out exactly what happened in the course of a -merge, update- command?  In other words: I have two datasets with a number of overlapping variables and I want to find out how often, for each variable, a missing observation in the master was updated with a non-missing observation in the using dataset.  Likewise, how often were observations in the master no updated because of a non-missing conflict.  Basically, this would be similar to the current merge results table, but on a variable-by-variable basis rather than based on the dataset as a whole.
>>>>
>>>> Of course, this same functionality would be useful for -merge, replace-, although that is not my present concern.
>>>>
>>>> If the answer is "no", is this something that people would be interested in?
>>>>
>>>> Regards,
>>>> Joe Canner
>>>> Johns Hopkins University School of Medicine
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index