Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Selecting part of a LARGE file


From   "Michael Blasnik" <michael.blasnik@verizon.net>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Re: Selecting part of a LARGE file
Date   Fri, 6 Jun 2003 14:03:04 -0400

I'm not sure what the problem is -- I deal with this issue all the time.
Why not:

use filea
keep patent
sort patent
merge patent using fileb, nokeep
tab _merge

*optionally check the merging and drop non-matching obs from filea
keep if _merge==3
drop _merge


Maybe the problem is that you aren't using the -nokeep- option for the
merge.  With this approach, I find Stata to be quite fast at just selecting
the observations of interest and there aren't any substantial memory
problems because you never have to hold the full combined dataset in memory.
I'm sure this would be much faster than any ridiculously long use if....
construct (and your approach using index will certainly run out of room
after just a few dozen patent strings are appended togther).

Michael Blasnik
michael.blasnik@verizon.net

----- Original Message ----- 
From: "Hoetker, Glenn" <ghoetker@uiuc.edu>
To: <statalist@hsphsun2.harvard.edu>
Sent: Friday, June 06, 2003 11:47 AM
Subject: st: Selecting part of a LARGE file


> Hi all.
>
> I have two files.  File A has about 5000 unique values of the variable
> PATENT, which is 7 characters long.  File B has 16 million observations
> and several million unique values for PATENT.  I want to do some
> manipulation involving File B, but only for the observations that
> correspond to the patent values found in File A.   I am currently using
> merge on the two files to do this (actually mmerge as a wrapper for
> ease), but wonder if there is an easier/faster way.
>
> I attempted using vallist.ado in File A to generate a long local macro
> (say, _useme) and then doing
>
> use FileB if index(patent, "'useme'")
>
> I get 0 observations in this case (even though I know there are some
> matches).  From the manual, it appears that index is limited to strings
> of 80 characters, anyway.
>
> Any better suggestions?  Thank you in advance.
>
> Glenn
>
> Glenn Hoetker
> Assistant Professor of Strategy
> College of Business Administration
> University of Illinois at Urbana-Champaign
> 217-265-4081
> ghoetker@uiuc.edu
> "Success is going from failure to failure without a loss of enthusiasm."
> Sir Winston Churchill
>


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index