[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: Relative efficiecy of merge
> -----Original Message-----
> From: Erik Ø. Sørensen [mailto:email@example.com]
> Sent: Friday, November 01, 2002 12:44 PM
> To: firstname.lastname@example.org
> Subject: Re: st: Relative efficiecy of merge
> On fredag, nov 1, 2002, at 12:22 America/Montreal, Hoetker,
> Glenn wrote:
> > One option I see is using merging A with B using the
> 'nokeep' option
> > and
> > saving the resultant dataset as B_reduced. Since dataset B
> is fairly
> > large, however, I want this to be as efficient as possible.
> Is merge
> > at
> > least close to the most efficient way to do this? If not,
> what might
> > be
> > more efficient?
> Have you tried and timed it? I merge files with 3-4 millions of
> observations regularly, and the cost of this is not so terrible. An
> example: it takes about 25 seconds to merge two datasets of 3
> on a unique identifier (one dataset had 2 variables, I merged
> in a set
> with 27 variables).
If you are timing different options, be sure to -set rmsg on- beforehand -- then stata reports how long each command takes.
My quick testing suggests that using the small dataset, and then merging in the big dataset (wth the -nokeep-) option, is faster than using the large dataset, merging the big one in, and then dropping non-matching observations. But would be easy enough for you to try it both ways yourself.
I'm not aware of a non-merge solution.
* For searches and help try: