Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Sanders <sandersn@stanford.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: merge |

Date |
Fri, 1 Jun 2012 09:42:54 -0700 |

Hello Fabian, Depending on what you want to accomplish, you might consider (1) naming the variables differently across data sets, or (2) merging on date as well (if that's an important distinction for your data). But having such variables can make merge outcomes look weird if you don't know what to look for. In general, for an idea of what Stata does in those situations, I'd take a look at the section in the merge help on "Treatment of overlapping variables", which I've copied below. Best, Nick Treatment of overlapping variables When performing merges of any type, the master and using datasets may have variables in common other than the key variables. We will call such variables overlapping variables. For instance, if the variables in the master and using datasets are master: id, region, sex, age, race using: id, sex, bp, race and id is the key variable, then the overlapping variables are sex and race. By default, merge treats values from the master as inviolable. When observations match, it is the master's values of the overlapping variables that are recorded in the merged result. If you specify the update option, however, then all missing values of overlapping variables in matched observations are replaced with values from the using data. Because of this new behavior, the merge codes change somewhat. Codes 1 and 2 keep their old meaning. Code 3 splits into codes 3, 4, and 5. Codes 3, 4, and 5 are filtered according to the following rules; the first applicable rule is used. 5 corresponds to matched observations where at least one overlapping variable had conflicting nonmissing values. 4 corresponds to matched observations where at least one missing value was updated, but there were no conflicting nonmissing values. 3 means observations matched, and there were neither updated missing values nor conflicting nonmissing values. If you specify both the update and replace options, then the _merge==5 cases are updated with values from the using data. On Jun 1, 2012, at 9:07 AM, Fabian Schönenberger wrote: > Yes I do! There is a variable called datadate with dates like ddmmyyyy, as well as one variable with dates like datemonthly with dates like yyyymXX. > The date-variable I use to uniquely identify the observations is called year with yyyy. > > Should I drop the other ones? > > > > > -------- Original-Nachricht -------- >> Datum: Fri, 1 Jun 2012 08:19:49 -0700 >> Von: Nick Sanders <sandersn@stanford.edu> >> An: "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> >> Betreff: Re: st: RE: merge > >> Fabian, >> >> As a shot in the dark, do you have any other variables that are common >> among the two data sets? For example, if data set 1 and 2 both contain a >> variable called "date" (in addition to the common variables on which you are >> merging), merging those two data sets will sometimes make your results look >> odd due to how Stata handles what to do with that variable. >> >> Best, >> Nick >> >> >> >> On Jun 1, 2012, at 8:05 AM, Simon Falck <simon.falck@abe.kth.se> wrote: >> >>> Dear Fabian, >>> >>> It is difficult to give a specific reply when you do not tell us more >> about your datasets and key variables. However, here are some general inputs >> on merging files in Stata that perhaps are useful for you. >>> >>> The -merge- command enables merging files with common id´s. One-to-one >> -merge 1:1- implies that the identifiers (key variables) are exactly the >> same in both files. If this is not your case then you should consider >> Many-to-one -merge m:1- or One-to-many -merge 1:m-. It depends on how your >> datasets are structured and their content. The two latter options are used when >> you have a common id in both files and one file, either the master or user, >> differ in for example time period. If you have two datasets which includes >> id´s that are non-common in both directions you use the Many-to-many >> -merge m:m- option. >>> >>> As understood the key-variables are important to inspect according to >> the options described above. Don’t forget to inspect the format, for >> example are both key variables in string format? Consider how a key-variables >> should be constructed and what a common attribute implies and the directions. >>> >>> You get a great instruction by typing: -help merge- >>> >>> Good luck, >>> Simon >>> >>> >>> >>> -----Original Message----- >>> From: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of "Fabian Schönenberger" >>> Sent: den 1 juni 2012 14:28 >>> To: statalist@hsphsun2.harvard.edu >>> Subject: st: merge >>> >>> Dear Statalist >>> I try to merge two datasets. In both files each observation is uniquely >> identified by cusip and year. I sort both files with xtset cusip year. >> Afterwards, I command: >>> >>> merge 1:1 cusip year using "C:\Users\User\Documents\Uni >> SG\Doktorat\Data\Price Data\pricedatev5.dta", keepusing(capm_marketpremium), keep(3) >>> >>> I am only interested in those observations of cusip-year-combinations >> which are in my masterfile - therefore keep(3). >>> >>> However, either the matched observations nor the unmatched observations >> are correct, meaning that I get for each cusip-year observations the wrong >> capm_marketpremium and I do not get observations in my masterfile for >> cusip-year observations although the using file has an observation. >>> >>> I also tried merge m:m but it did not work. What am I doing the wrong >> way? >>> >>> Many thanks for suggestions. >>> >>> Fabian >>> -- >>> NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone! >> >>> Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > -- > NEU: FreePhone 3-fach-Flat mit kostenlosem Smartphone! > Jetzt informieren: http://mobile.1und1.de/?ac=OM.PW.PW003K20328T7073a > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: merge***From:*"Fabian Schönenberger" <sch.f@gmx.ch>

**References**:**st: merge***From:*"Fabian Schönenberger" <sch.f@gmx.ch>

**st: RE: merge***From:*Simon Falck <simon.falck@abe.kth.se>

**Re: st: RE: merge***From:*Nick Sanders <sandersn@stanford.edu>

**Re: st: RE: merge***From:*"Fabian Schönenberger" <sch.f@gmx.ch>

- Prev by Date:
**Re: st: RE: merge** - Next by Date:
**Re: st: Comparing multiple means with survey data--revisited** - Previous by thread:
**Re: st: RE: merge** - Next by thread:
**Re: st: RE: merge** - Index(es):