Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Relative efficiecy of merge


From   "Nick Winter" <[email protected]>
To   <[email protected]>
Subject   RE: st: Relative efficiecy of merge
Date   Fri, 1 Nov 2002 12:54:58 -0500

> -----Original Message-----
> From: Erik �. S�rensen [mailto:[email protected]] 
> Sent: Friday, November 01, 2002 12:44 PM
> To: [email protected]
> Subject: Re: st: Relative efficiecy of merge
> 
> 
> On fredag, nov 1, 2002, at 12:22 America/Montreal, Hoetker, 
> Glenn wrote:
> > One option I see is using merging A with B using the 
> 'nokeep' option 
> > and
> > saving the resultant dataset as B_reduced.  Since dataset B 
> is fairly
> > large, however, I want this to be as efficient as possible. 
>  Is merge 
> > at
> > least close to the most efficient way to do this?  If not, 
> what might 
> > be
> > more efficient?
> 
> Have you tried and timed it? I merge files with 3-4 millions of 
> observations regularly, and the cost of this is not so terrible. An 
> example: it takes about 25 seconds to merge two datasets of 3 
> millions 
> on a unique identifier (one dataset had 2 variables, I merged 
> in a set 
> with 27 variables).
> 

If you are timing different options, be sure to -set rmsg on- beforehand -- then stata reports how long each command takes.

My quick testing suggests that using the small dataset, and then merging in the big dataset (wth the -nokeep-) option, is faster than using the large dataset, merging the big one in, and then dropping non-matching observations.  But would be easy enough for you to try it both ways yourself.

I'm not aware of a non-merge solution.

Nick Winter
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index