Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: merge

From	"Fabian Schönenberger" <[email protected]>
To	[email protected]
Subject	Re: st: RE: merge
Date	Fri, 01 Jun 2012 19:03:19 +0200

Dear Nick

I should be more precise.

The using-dataset consists of three variables: cusip, year and capm_marketpremium. Cusip and year I need to identify each observation. The masterfile consists of cusip, year, the before-mentioned additional date-variables and many variables more. So, there are no other overlapping variables beyond those, who identify each observation. I am not sure if overlapping is an issue for my case. All I want is to "copy" capm_marketpremium from using-dataset to the masterfile for each observation identified by cusip and year. 

Are there other possible problems with merge I do not consider?

Many thanks for your support!

Fabian

-------- Original-Nachricht --------
> Datum: Fri, 1 Jun 2012 09:42:54 -0700
> Von: Nick Sanders <[email protected]>
> An: [email protected]
> Betreff: Re: st: RE: merge

> Hello Fabian,
> 
> Depending on what you want to accomplish, you might consider (1) naming
> the variables differently across data sets, or (2) merging on date as well
> (if that's an important distinction for your data). But having such variables
> can make merge outcomes look weird if you don't know what to look for.
> 
> In general, for an idea of what Stata does in those situations, I'd take a
> look at the section in the merge help on "Treatment of overlapping
> variables", which I've copied below.
> 
> Best,
> Nick
> 
> 
> Treatment of overlapping variables
> 
>     When performing merges of any type, the master and using datasets may
> have variables in
>     common other than the key variables.  We will call such variables
> overlapping variables.
>     For instance, if the variables in the master and using datasets are
> 
>         master:  id, region, sex, age, race
> 
>          using:  id, sex, bp, race
> 
>     and id is the key variable, then the overlapping variables are sex and
> race.
> 
>     By default, merge treats values from the master as inviolable.  When
> observations match, it
>     is the master's values of the overlapping variables that are recorded
> in the merged result.
> 
>     If you specify the update option, however, then all missing values of
> overlapping variables
>     in matched observations are replaced with values from the using data.
> Because of this new
>     behavior, the merge codes change somewhat.  Codes 1 and 2 keep their
> old meaning.  Code 3
>     splits into codes 3, 4, and 5.  Codes 3, 4, and 5 are filtered
> according to the following
>     rules; the first applicable rule is used.
> 
>         5 corresponds to matched observations where at least one
> overlapping variable had
>             conflicting nonmissing values.
>         4 corresponds to matched observations where at least one missing
> value was updated, but
>             there were no conflicting nonmissing values.
>         3 means observations matched, and there were neither updated
> missing values nor
>             conflicting nonmissing values.
> 
>     If you specify both the update and replace options, then the _merge==5
> cases are updated
>     with values from the using data.
> 
> 
> On Jun 1, 2012, at 9:07 AM, Fabian Schönenberger wrote:
> 
> > Yes I do! There is a variable called datadate with dates like ddmmyyyy,
> as well as one variable with dates like datemonthly with dates like
> yyyymXX.
> > The date-variable I use to uniquely identify the observations is called
> year with yyyy. 
> > 
> > Should I drop the other ones?
> > 
> > 
> > 
> > 
> > -------- Original-Nachricht --------
> >> Datum: Fri, 1 Jun 2012 08:19:49 -0700
> >> Von: Nick Sanders <[email protected]>
> >> An: "[email protected]" <[email protected]>
> >> Betreff: Re: st: RE: merge
> > 
> >> Fabian,
> >> 
> >> As a shot in the dark, do you have any other variables that are common
> >> among the two data sets? For example, if data set 1 and 2 both contain
> a
> >> variable called "date" (in addition to the common variables on which
> you are
> >> merging), merging those two data sets will sometimes make your results
> look
> >> odd due to how Stata handles what to do with that variable.
> >> 
> >> Best,
> >> Nick
> >> 
> >> 
> >> 
> >> On Jun 1, 2012, at 8:05 AM, Simon Falck <[email protected]> wrote:
> >> 
> >>> Dear Fabian, 
> >>> 
> >>> It is difficult to give a specific reply when you do not tell us more
> >> about your datasets and key variables. However, here are some general
> inputs
> >> on merging files in Stata that perhaps are useful for you.
> >>> 
> >>> The -merge- command enables merging files with common id´s.
> One-to-one 
> >> -merge 1:1- implies that the identifiers (key variables) are exactly
> the
> >> same in both files. If this is not your case then you should consider
> >> Many-to-one -merge m:1- or One-to-many -merge 1:m-. It depends on how
> your
> >> datasets are structured and their content. The two latter options are
> used when
> >> you have a common id in both files and one file, either the master or
> user,
> >> differ in for example time period. If you have two datasets which
> includes
> >> id´s that are non-common in both directions you use the Many-to-many
> >> -merge m:m- option.
> >>> 
> >>> As understood the key-variables are important to inspect according to
> >> the options described above. Don’t forget to inspect the format, for
> >> example are both key variables in string format? Consider how a
> key-variables
> >> should be constructed and what a common attribute implies and the
> directions.
> >>> 
> >>> You get a great instruction by typing: -help merge-
> >>> 
> >>> Good luck,
> >>> Simon
> >>> 
> >>> 
> >>> 
> >>> -----Original Message-----
> >>> From: [email protected]
> >> [mailto:[email protected]] On Behalf Of "Fabian
> Schönenberger"
> >>> Sent: den 1 juni 2012 14:28
> >>> To: [email protected]
> >>> Subject: st: merge
> >>> 
> >>> Dear Statalist
> >>> I try to merge two datasets. In both files each observation is
> uniquely
> >> identified by cusip and year. I sort both files with xtset cusip year.
> >> Afterwards, I command: 
> >>> 
> >>> merge 1:1 cusip year using "C:\Users\User\Documents\Uni
> >> SG\Doktorat\Data\Price Data\pricedatev5.dta",
> keepusing(capm_marketpremium), keep(3)
> >>> 
> >>> I am only interested in those observations of cusip-year-combinations
> >> which are in my masterfile - therefore keep(3). 
> >>> 
> >>> However, either the matched observations nor the unmatched
> observations
> >> are correct, meaning that I get for each cusip-year observations the
> wrong
> >> capm_marketpremium and I do not get observations in my masterfile for
> >> cusip-year observations although the using file has an observation. 
> >>> 
> >>> I also tried merge m:m but it did not work. What am I doing the wrong
> >> way?
> >>> 
> >>> Many thanks for suggestions.
> >>> 
> >>> Fabian 
> >>> -- 
> >>> NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                
>  
> >> 
> >>> Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
> >>> *
> >>> *   For searches and help try:
> >>> *   http://www.stata.com/help.cgi?search
> >>> *   http://www.stata.com/support/statalist/faq
> >>> *   http://www.ats.ucla.edu/stat/stata/
> >>> 
> >>> *
> >>> *   For searches and help try:
> >>> *   http://www.stata.com/help.cgi?search
> >>> *   http://www.stata.com/support/statalist/faq
> >>> *   http://www.ats.ucla.edu/stat/stata/
> >> 
> >> *
> >> *   For searches and help try:
> >> *   http://www.stata.com/help.cgi?search
> >> *   http://www.stata.com/support/statalist/faq
> >> *   http://www.ats.ucla.edu/stat/stata/
> > 
> > -- 
> > NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                  
>                
> > Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

-- 
NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone!                                  
Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: merge
  - From: Nick Sanders <[email protected]>

References:
- st: merge
  - From: "Fabian Schönenberger" <[email protected]>
- st: RE: merge
  - From: Simon Falck <[email protected]>
- Re: st: RE: merge
  - From: Nick Sanders <[email protected]>
- Re: st: RE: merge
  - From: "Fabian Schönenberger" <[email protected]>
- Re: st: RE: merge
  - From: Nick Sanders <[email protected]>

Prev by Date: Re: st: Comparing multiple means with survey data--revisited
Next by Date: st: RE: plotting time series
Previous by thread: Re: st: RE: merge
Next by thread: Re: st: RE: merge
Index(es):
- Date
- Thread