Stata 15 help for merge

[D] merge -- Merge datasets


One-to-one merge on specified key variables

merge 1:1 varlist using filename [, options]

Many-to-one merge on specified key variables

merge m:1 varlist using filename [, options]

One-to-many merge on specified key variables

merge 1:m varlist using filename [, options]

Many-to-many merge on specified key variables

merge m:m varlist using filename [, options]

One-to-one merge by observation

merge 1:1 _n using filename [, options]

options Description ------------------------------------------------------------------------- Options keepusing(varlist) variables to keep from using data; default is all generate(newvar) name of new variable to mark merge results; default is _merge nogenerate do not create _merge variable nolabel do not copy value-label definitions from using nonotes do not copy notes from using update update missing values of same-named variables in master with values from using replace replace all values of same-named variables in master with nonmissing values from using (requires update) noreport do not display match result summary table force allow string/numeric variable type mismatch without error

Results assert(results) specify required match results keep(results) specify which match results to keep

sorted do not sort; datasets already sorted ------------------------------------------------------------------------- sorted does not appear in the dialog box.


Data > Combine datasets > Merge two datasets


merge joins corresponding observations from the dataset currently in memory (called the master dataset) with those from filename.dta (called the using dataset), matching on one or more key variables. merge can perform match merges (one-to-one, one-to-many, many-to-one, and many-to-many), which are often called 'joins' by database people. merge can also perform sequential merges, which have no equivalent in the relational database world.

merge is for adding new variables from a second dataset to existing observations. You use merge, for instance, when combining hospital patient and discharge datasets. If you wish to add new observations to existing variables, then see [D] append. You use append, for instance, when adding current discharges to past discharges.

By default, merge creates a new variable, _merge, containing numeric codes concerning the source and the contents of each observation in the merged dataset. These codes are explained below in the match results table.

Key variables cannot be strLs.

If filename is specified without an extension, then .dta is assumed.


+---------+ ----+ Options +----------------------------------------------------------

keepusing(varlist) specifies the variables from the using dataset that are kept in the merged dataset. By default, all variables are kept. For example, if your using dataset contains 2,000 demographic characteristics but you want only sex and age, then type merge ..., keepusing(sex age) ....

generate(newvar) specifies that the variable containing match results information should be named newvar rather than _merge.

nogenerate specifies that _merge not be created. This would be useful if you also specified keep(match), because keep(match) ensures that all values of _merge would be 3.

nolabel specifies that value-label definitions from the using file be ignored. This option should be rare, because definitions from the master are already used.

nonotes specifies that notes in the using dataset not be added to the merged dataset; see [D] notes.

update and replace both perform an update merge rather than a standard merge. In a standard merge, the data in the master are the authority and inviolable. For example, if the master and using datasets both contain a variable age, then matched observations will contain values from the master dataset, while unmatched observations will contain values from their respective datasets.

If update is specified, then matched observations will update missing values from the master dataset with values from the using dataset. Nonmissing values in the master dataset will be unchanged.

If replace is specified, then matched observations will contain values from the using dataset, unless the value in the using dataset is missing.

Specifying either update or replace affects the meanings of the match codes. See Treatment of overlapping variables in [D] merge for details.

noreport specifies that merge not present its summary table of match results.

force allows string/numeric variable type mismatches, resulting in missing values from the using dataset. If omitted, merge issues an error; if specified, merge issues a warning.

+---------+ ----+ Results +----------------------------------------------------------

assert(results) specifies the required match results. The possible results are

Numeric Equivalent code word (results) Description ------------------------------------------------------------------- 1 master observation appeared in master only 2 using observation appeared in using only 3 match observation appeared in both

4 match_update observation appeared in both, missing values updated 5 match_conflict observation appeared in both, conflicting nonmissing values ------------------------------------------------------------------- Codes 4 and 5 can arise only if the update option is specified. If codes of both 4 and 5 could pertain to an observation, then 5 is used.

Numeric codes and words are equivalent when used in the assert() or keep() options.

The following synonyms are allowed: masters for master, usings for using, matches and matched for match, match_updates for match_update, and match_conflicts for match_conflict.

Using assert(match master) specifies that the merged file is required to include only matched master or using observations and unmatched master observations, and may not include unmatched using observations. Specifying assert() results in merge issuing an error if there are match results among those observations you allowed.

The order of the words or codes is not important, so all the following assert() specifications would be the same:

assert(match master)

assert(master matches)

assert(1 3)

When the match results contain codes other than those allowed, return code 9 is returned, and the merged dataset with the unanticipated results is left in memory to allow you to investigate.

keep(results) specifies which observations are to be kept from the merged dataset. Using keep(match master) specifies keeping only matched observations and unmatched master observations after merging.

keep() differs from assert() because it selects observations from the merged dataset rather than enforcing requirements. keep() is used to pare the merged dataset to a given set of observations when you do not care if there are other observations in the merged dataset. assert() is used to verify that only a given set of observations is in the merged dataset.

You can specify both assert() and keep(). If you require matched observations and unmatched master observations but you want only the matched observations, then you could specify assert(match master) keep(match).

assert() and keep() are convenience options whose functionality can be duplicated using _merge directly.

. merge ..., assert(match master) keep(match)

is identical to

. merge ... . assert _merge==1 | _merge==3 . keep if _merge==3

The following option is available with merge but is not shown in the dialog box:

sorted specifies that the master and using datasets are already sorted by varlist. If the datasets are already sorted, then merge runs a little more quickly; the difference is hardly detectable, so this option is of interest only where speed is of the utmost importance.

Prior syntax

Prior to Stata 11, merge had a more primitive syntax. Code using the old syntax will run unmodified. To assist those attempting to understand or debug out-of-date code, the original help file for merge can be found here.


--------------------------------------------------------------------------- Setup . webuse autosize . list . webuse autoexpense . list

Perform 1:1 match merge . webuse autosize . merge 1:1 make using . list

--------------------------------------------------------------------------- Perform 1:1 match merge, requiring there to be only matches (The merge command intentionally causes an error message.) . webuse autosize, clear . merge 1:1 make using, assert(match) . tab _merge . list

--------------------------------------------------------------------------- Perform 1:1 match merge, keeping only matches and squelching the _merge variable . webuse autosize, clear . merge 1:1 make using, keep(match) nogen . list

--------------------------------------------------------------------------- Setup . webuse dollars, clear . list . webuse sforce . list

Perform m:1 match merge with sforce in memory . merge m:1 region using . list

--------------------------------------------------------------------------- Setup . webuse overlap1, clear . list, sepby(id) . webuse overlap2 . list

Perform m:1 match merge, illustrating update option . webuse overlap1 . merge m:1 id using, update . list

--------------------------------------------------------------------------- Perform m:1 match merge, illustrating update replace option . webuse overlap1, clear . merge m:1 id using, update replace . list

--------------------------------------------------------------------------- Perform 1:m match merge, illustrating update replace option . webuse overlap2, clear . merge 1:m id using, update replace . list

--------------------------------------------------------------------------- Perform sequential merge . webuse sforce, clear . merge 1:1 _n using . list


Video example

How to merge files into a single dataset

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index