Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: merge

From	Nick Sanders <[email protected]>
To	[email protected]
Subject	Re: st: RE: merge
Date	Fri, 1 Jun 2012 09:42:54 -0700

Hello Fabian,

Depending on what you want to accomplish, you might consider (1) naming the variables differently across data sets, or (2) merging on date as well (if that's an important distinction for your data). But having such variables can make merge outcomes look weird if you don't know what to look for.

In general, for an idea of what Stata does in those situations, I'd take a look at the section in the merge help on "Treatment of overlapping variables", which I've copied below.

Best,
Nick


Treatment of overlapping variables

    When performing merges of any type, the master and using datasets may have variables in
    common other than the key variables.  We will call such variables overlapping variables.
    For instance, if the variables in the master and using datasets are

        master:  id, region, sex, age, race

         using:  id, sex, bp, race

    and id is the key variable, then the overlapping variables are sex and race.

    By default, merge treats values from the master as inviolable.  When observations match, it
    is the master's values of the overlapping variables that are recorded in the merged result.

    If you specify the update option, however, then all missing values of overlapping variables
    in matched observations are replaced with values from the using data. Because of this new
    behavior, the merge codes change somewhat.  Codes 1 and 2 keep their old meaning.  Code 3
    splits into codes 3, 4, and 5.  Codes 3, 4, and 5 are filtered according to the following
    rules; the first applicable rule is used.

        5 corresponds to matched observations where at least one overlapping variable had
            conflicting nonmissing values.
        4 corresponds to matched observations where at least one missing value was updated, but
            there were no conflicting nonmissing values.
        3 means observations matched, and there were neither updated missing values nor
            conflicting nonmissing values.

    If you specify both the update and replace options, then the _merge==5 cases are updated
    with values from the using data.


On Jun 1, 2012, at 9:07 AM, Fabian Schönenberger wrote:

> Yes I do! There is a variable called datadate with dates like ddmmyyyy, as well as one variable with dates like datemonthly with dates like yyyymXX.
> The date-variable I use to uniquely identify the observations is called year with yyyy. 
> 
> Should I drop the other ones?
> 
> 
> 
> 
> -------- Original-Nachricht --------
>> Datum: Fri, 1 Jun 2012 08:19:49 -0700
>> Von: Nick Sanders <[email protected]>
>> An: "[email protected]" <[email protected]>
>> Betreff: Re: st: RE: merge
> 
>> Fabian,
>> 
>> As a shot in the dark, do you have any other variables that are common
>> among the two data sets? For example, if data set 1 and 2 both contain a
>> variable called "date" (in addition to the common variables on which you are
>> merging), merging those two data sets will sometimes make your results look
>> odd due to how Stata handles what to do with that variable.
>> 
>> Best,
>> Nick
>> 
>> 
>> 
>> On Jun 1, 2012, at 8:05 AM, Simon Falck <[email protected]> wrote:
>> 
>>> Dear Fabian, 
>>> 
>>> It is difficult to give a specific reply when you do not tell us more
>> about your datasets and key variables. However, here are some general inputs
>> on merging files in Stata that perhaps are useful for you.
>>> 
>>> The -merge- command enables merging files with common id´s. One-to-one 
>> -merge 1:1- implies that the identifiers (key variables) are exactly the
>> same in both files. If this is not your case then you should consider
>> Many-to-one -merge m:1- or One-to-many -merge 1:m-. It depends on how your
>> datasets are structured and their content. The two latter options are used when
>> you have a common id in both files and one file, either the master or user,
>> differ in for example time period. If you have two datasets which includes
>> id´s that are non-common in both directions you use the Many-to-many
>> -merge m:m- option.
>>> 
>>> As understood the key-variables are important to inspect according to
>> the options described above. Don’t forget to inspect the format, for
>> example are both key variables in string format? Consider how a key-variables
>> should be constructed and what a common attribute implies and the directions.
>>> 
>>> You get a great instruction by typing: -help merge-
>>> 
>>> Good luck,
>>> Simon
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of "Fabian Schönenberger"
>>> Sent: den 1 juni 2012 14:28
>>> To: [email protected]
>>> Subject: st: merge
>>> 
>>> Dear Statalist
>>> I try to merge two datasets. In both files each observation is uniquely
>> identified by cusip and year. I sort both files with xtset cusip year.
>> Afterwards, I command: 
>>> 
>>> merge 1:1 cusip year using "C:\Users\User\Documents\Uni
>> SG\Doktorat\Data\Price Data\pricedatev5.dta", keepusing(capm_marketpremium), keep(3)
>>> 
>>> I am only interested in those observations of cusip-year-combinations
>> which are in my masterfile - therefore keep(3). 
>>> 
>>> However, either the matched observations nor the unmatched observations
>> are correct, meaning that I get for each cusip-year observations the wrong
>> capm_marketpremium and I do not get observations in my masterfile for
>> cusip-year observations although the using file has an observation. 
>>> 
>>> I also tried merge m:m but it did not work. What am I doing the wrong
>> way?
>>> 
>>> Many thanks for suggestions.
>>> 
>>> Fabian 
>>> -- 
>>> NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                  
>> 
>>> Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>> 
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> -- 
> NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                                  
> Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: merge
  - From: "Fabian Schönenberger" <[email protected]>

References:
- st: merge
  - From: "Fabian Schönenberger" <[email protected]>
- st: RE: merge
  - From: Simon Falck <[email protected]>
- Re: st: RE: merge
  - From: Nick Sanders <[email protected]>
- Re: st: RE: merge
  - From: "Fabian Schönenberger" <[email protected]>

Prev by Date: Re: st: RE: merge
Next by Date: Re: st: Comparing multiple means with survey data--revisited
Previous by thread: Re: st: RE: merge
Next by thread: Re: st: RE: merge
Index(es):
- Date
- Thread