Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Relative efficiecy of merge


From   "Hoetker, Glenn" <ghoetker@uiuc.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Relative efficiecy of merge
Date   Fri, 1 Nov 2002 11:22:54 -0600

Hi all.  I have a question about the efficiency of the 'merge' command.

I have two datasets, A and B.  A consists of about 500 distinct
observations of single variable, PATENT.  B consists of about 16 million
observations of two variables, one of which is CITED_PATENT.

I would like to keep only the observations of B in which CITED_PATENT
corresponds to one of the values of PATENT contained in A.  

As I work with this data, the contents of A will change from time to
time, so I want this to be easily repeatable.

One option I see is using merging A with B using the 'nokeep' option and
saving the resultant dataset as B_reduced.  Since dataset B is fairly
large, however, I want this to be as efficient as possible.  Is merge at
least close to the most efficient way to do this?  If not, what might be
more efficient?

Many thanks!

Glenn Hoetker
Assistant Professor of Strategy
College of Commerce & Business Administration
University of Illinois at Urbana-Champaign
(217) 265-4081
ghoetker@uiuc.edu


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index