Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: merge

From	Nick Sanders <[email protected]>
To	[email protected]
Subject	Re: st: RE: merge
Date	Fri, 1 Jun 2012 10:26:14 -0700

Hi Fabian,

Ah, my apologies for misunderstanding you (and for being vague in my phrasing of my earlier suggestion). 

I'm afraid that without a closer look at the data itself, I'm just about out of tricks. I'd suggest trying the –joinby– command, which is sometimes more flexible than –merge– (and potentially more dangerous for just that reason) and see what kind of results you get. If things look the same, then you know the problem isn't the –merge– command use per se, but rather a function of your data. If not, perhaps how the data result will give you insight into what's going on.

I'd also double-check for potential duplicates in either of your data sets.

-Nick

On Jun 1, 2012, at 10:03 AM, Fabian Schönenberger wrote:

> Dear Nick
> 
> I should be more precise.
> 
> The using-dataset consists of three variables: cusip, year and capm_marketpremium. Cusip and year I need to identify each observation. The masterfile consists of cusip, year, the before-mentioned additional date-variables and many variables more. So, there are no other overlapping variables beyond those, who identify each observation. I am not sure if overlapping is an issue for my case. All I want is to "copy" capm_marketpremium from using-dataset to the masterfile for each observation identified by cusip and year. 
> 
> Are there other possible problems with merge I do not consider?
> 
> Many thanks for your support!
> 
> Fabian
> 
> -------- Original-Nachricht --------
>> Datum: Fri, 1 Jun 2012 09:42:54 -0700
>> Von: Nick Sanders <[email protected]>
>> An: [email protected]
>> Betreff: Re: st: RE: merge
> 
>> Hello Fabian,
>> 
>> Depending on what you want to accomplish, you might consider (1) naming
>> the variables differently across data sets, or (2) merging on date as well
>> (if that's an important distinction for your data). But having such variables
>> can make merge outcomes look weird if you don't know what to look for.
>> 
>> In general, for an idea of what Stata does in those situations, I'd take a
>> look at the section in the merge help on "Treatment of overlapping
>> variables", which I've copied below.
>> 
>> Best,
>> Nick
>> 
>> 
>> Treatment of overlapping variables
>> 
>>    When performing merges of any type, the master and using datasets may
>> have variables in
>>    common other than the key variables.  We will call such variables
>> overlapping variables.
>>    For instance, if the variables in the master and using datasets are
>> 
>>        master:  id, region, sex, age, race
>> 
>>         using:  id, sex, bp, race
>> 
>>    and id is the key variable, then the overlapping variables are sex and
>> race.
>> 
>>    By default, merge treats values from the master as inviolable.  When
>> observations match, it
>>    is the master's values of the overlapping variables that are recorded
>> in the merged result.
>> 
>>    If you specify the update option, however, then all missing values of
>> overlapping variables
>>    in matched observations are replaced with values from the using data.
>> Because of this new
>>    behavior, the merge codes change somewhat.  Codes 1 and 2 keep their
>> old meaning.  Code 3
>>    splits into codes 3, 4, and 5.  Codes 3, 4, and 5 are filtered
>> according to the following
>>    rules; the first applicable rule is used.
>> 
>>        5 corresponds to matched observations where at least one
>> overlapping variable had
>>            conflicting nonmissing values.
>>        4 corresponds to matched observations where at least one missing
>> value was updated, but
>>            there were no conflicting nonmissing values.
>>        3 means observations matched, and there were neither updated
>> missing values nor
>>            conflicting nonmissing values.
>> 
>>    If you specify both the update and replace options, then the _merge==5
>> cases are updated
>>    with values from the using data.
>> 
>> 
>> On Jun 1, 2012, at 9:07 AM, Fabian Schönenberger wrote:
>> 
>>> Yes I do! There is a variable called datadate with dates like ddmmyyyy,
>> as well as one variable with dates like datemonthly with dates like
>> yyyymXX.
>>> The date-variable I use to uniquely identify the observations is called
>> year with yyyy. 
>>> 
>>> Should I drop the other ones?
>>> 
>>> 
>>> 
>>> 
>>> -------- Original-Nachricht --------
>>>> Datum: Fri, 1 Jun 2012 08:19:49 -0700
>>>> Von: Nick Sanders <[email protected]>
>>>> An: "[email protected]" <[email protected]>
>>>> Betreff: Re: st: RE: merge
>>> 
>>>> Fabian,
>>>> 
>>>> As a shot in the dark, do you have any other variables that are common
>>>> among the two data sets? For example, if data set 1 and 2 both contain
>> a
>>>> variable called "date" (in addition to the common variables on which
>> you are
>>>> merging), merging those two data sets will sometimes make your results
>> look
>>>> odd due to how Stata handles what to do with that variable.
>>>> 
>>>> Best,
>>>> Nick
>>>> 
>>>> 
>>>> 
>>>> On Jun 1, 2012, at 8:05 AM, Simon Falck <[email protected]> wrote:
>>>> 
>>>>> Dear Fabian, 
>>>>> 
>>>>> It is difficult to give a specific reply when you do not tell us more
>>>> about your datasets and key variables. However, here are some general
>> inputs
>>>> on merging files in Stata that perhaps are useful for you.
>>>>> 
>>>>> The -merge- command enables merging files with common id´s.
>> One-to-one 
>>>> -merge 1:1- implies that the identifiers (key variables) are exactly
>> the
>>>> same in both files. If this is not your case then you should consider
>>>> Many-to-one -merge m:1- or One-to-many -merge 1:m-. It depends on how
>> your
>>>> datasets are structured and their content. The two latter options are
>> used when
>>>> you have a common id in both files and one file, either the master or
>> user,
>>>> differ in for example time period. If you have two datasets which
>> includes
>>>> id´s that are non-common in both directions you use the Many-to-many
>>>> -merge m:m- option.
>>>>> 
>>>>> As understood the key-variables are important to inspect according to
>>>> the options described above. Don’t forget to inspect the format, for
>>>> example are both key variables in string format? Consider how a
>> key-variables
>>>> should be constructed and what a common attribute implies and the
>> directions.
>>>>> 
>>>>> You get a great instruction by typing: -help merge-
>>>>> 
>>>>> Good luck,
>>>>> Simon
>>>>> 
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: [email protected]
>>>> [mailto:[email protected]] On Behalf Of "Fabian
>> Schönenberger"
>>>>> Sent: den 1 juni 2012 14:28
>>>>> To: [email protected]
>>>>> Subject: st: merge
>>>>> 
>>>>> Dear Statalist
>>>>> I try to merge two datasets. In both files each observation is
>> uniquely
>>>> identified by cusip and year. I sort both files with xtset cusip year.
>>>> Afterwards, I command: 
>>>>> 
>>>>> merge 1:1 cusip year using "C:\Users\User\Documents\Uni
>>>> SG\Doktorat\Data\Price Data\pricedatev5.dta",
>> keepusing(capm_marketpremium), keep(3)
>>>>> 
>>>>> I am only interested in those observations of cusip-year-combinations
>>>> which are in my masterfile - therefore keep(3). 
>>>>> 
>>>>> However, either the matched observations nor the unmatched
>> observations
>>>> are correct, meaning that I get for each cusip-year observations the
>> wrong
>>>> capm_marketpremium and I do not get observations in my masterfile for
>>>> cusip-year observations although the using file has an observation. 
>>>>> 
>>>>> I also tried merge m:m but it did not work. What am I doing the wrong
>>>> way?
>>>>> 
>>>>> Many thanks for suggestions.
>>>>> 
>>>>> Fabian 
>>>>> -- 
>>>>> NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                
>> 
>>>> 
>>>>> Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/statalist/faq
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>> 
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/statalist/faq
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>> 
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> 
>>> -- 
>>> NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                  
>> 
>>> Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> -- 
> NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                                  
> Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: merge
  - From: "Fabian Schönenberger" <[email protected]>
- st: RE: merge
  - From: Simon Falck <[email protected]>
- Re: st: RE: merge
  - From: Nick Sanders <[email protected]>
- Re: st: RE: merge
  - From: "Fabian Schönenberger" <[email protected]>
- Re: st: RE: merge
  - From: Nick Sanders <[email protected]>
- Re: st: RE: merge
  - From: "Fabian Schönenberger" <[email protected]>

Prev by Date: st: RE: plotting time series
Next by Date: st: How to combine multiple imputation with reg3 for simultaneous equations
Previous by thread: Re: st: RE: merge
Next by thread: Re: st: RE: merge
Index(es):
- Date
- Thread