[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Random merging

From	"Michael I. Lichter" <[email protected]>
To	[email protected]
Subject	Re: st: Random merging
Date	Fri, 31 Jul 2009 03:11:28 -0400

Anna,

If this doesn't do what you want, you need to be more specific aboutyour needs:


------
use file2, clear
set seed 20090730
gen myorder = runiform()
sort LINKIDX myorder
tempfile file2tmp
save `file2tmp'
use file1
merge LINKINDX using `file2tmp', sort
drop myorder
------

Michael

[email protected] wrote:

Hi all,
I'm a relatively new STATA user, and I'm trying to merge a couple of large datasets where neither the master nor the using dataset has a unique key.The data comes in this format:Dataset 1: (note that LINKIDX is not unique)EVNTIDX LINKIDX EVENTYR EVENTMM EVENTDD ...
1.  300020190021   300020190083    2006                 8                     6
2. 300020190021 300020190052 2006 8 63. 300110100795 300110101161 2006 4 10
4.  300110100822   300110101161    2006                 7                    19
5.  300110100808   300110101161    2006                 5                     8
Dataset 2: (note that LINKIDX is not unique)LINKIDX DUPERSID RXRECIDX ...
1. 300020190083     30002019        300020190083001
2. 300020190083     30002019        300020198849002
3. 300110101161     30011010        300110101161001
4. 300110101161     30011010        300110101161003
I have already performed a merge where I have limited dataset 1 to only the unique observations of LINKIDX, and linked them to the multiple observations in dataset 2 (using a one-to-many merge). In the case of the above datasets, it would involve linking observation 1 in dataset 1 to observations 2 and 3 in dataset 2.However, I would like to perform a random link for the remaining observations. That is, for observations 3-5 in dataset 1, which match the LINKIDX for observations 3 and 4 in dataset 2, I would like for STATA to randomly pick a LINKIDX in dataset 1 to merge with each matching LINKIDX in dataset 2.I am not sure whether I should simply use the merge function, because it may result in systematic selection of one observation in dataset 1.Any ideas as to how I might be able to accomplish this task?Thank you in advance!Regards,
Anna Dijkstra


Please access the attached hyperlink for an important electronic communications disclaimer: http://www.lse.ac.uk/collections/secretariat/legal/disclaimer.htm

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


--
Michael I. Lichter, Ph.D. <[email protected]>
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 126 / Phone: 716-898-4751 / FAX: 716-898-3536

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Random merging
  - From: Austin Nichols <[email protected]>

References:
- st: Random merging
  - From: <[email protected]>

Prev by Date: st: R: stpiece - general hazard estimation question
Next by Date: Re: st: Re: why -odbc- made error?
Previous by thread: st: Random merging
Next by thread: Re: st: Random merging
Index(es):
- Date
- Thread