Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: combining datasets


From   Anders Alexandersson <[email protected]>
To   [email protected]
Subject   Re: st: AW: combining datasets
Date   Thu, 19 Aug 2010 11:55:32 -0400

Martine,

Also see [U] 22 Combining datasets. Maarten provided an excellent
append solution with this being the main line:
. append using `a'

Here is the equivalent merge solution:
. merge 1:1 source id using `a', nogen

The choice between append and merge is more important for large
datasets because you need the right variable naming scheme.
Michael Mitchell gave a good tip in his data management book described
at http://www.stata.com/bookstore/dmus.html :
If you will append datasets, you want the variable names to be the same,
but if you will merge datasets, you want the variable names to be different.

Anders Alexandersson
[email protected]

On Thu, Aug 19, 2010 at 4:34 AM, Maarten buis <[email protected]> wrote:
> --- On Wed, 18/8/10, martine etienne wrote:
>> firstly, person 1 in dataset A is NOT same person as person
>> 1 in dataset B, measurements are also taken at different times
>> secondly, I would like the final dataset to look like Final 1
>
> Here is an example of how to do that:
>
> *------------ begin example ------------
> // create the two datasets
> tempfile a b
>
> drop _all
> input id x
> 1  3
> 2  4
> end
> save `a'
>
> drop _all
> input id x
> 1  5
> 2  6
> end
> save `b'
>
> // create a new variable in each dataset
> // that identifies the source of those
> // observations
> use `a'
> gen source = "a"
>
> save `a', replace
>
> use `b'
> gen source = "b"
> save `b', replace
>
> // use -append- to stack the datasets
> append using `a'
>
> // create a extra id variable, which contains
> // an unique integer for each source-id combination
> // and attaches the values of the source and id
> // variables to the value label
> egen long new_id = group(source id), label
>
> // for display purposes I put the thre id variables
> // to the left of the dataset
> order id source new_id
>
> // display the result
> list
> *--------------- end example ----------------
> (For more on examples I sent to the Statalist see:
> http://www.maartenbuis.nl/example_faq )
>
> Hope this helps,
> Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://www.maartenbuis.nl
> --------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index