Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Selecting part of a LARGE file

From   "Nick Cox" <>
To   <>
Subject   st: RE: Selecting part of a LARGE file
Date   Fri, 6 Jun 2003 18:03:31 +0100

Hoetker, Glenn

> I have two files.  File A has about 5000 unique values of
> the variable
> PATENT, which is 7 characters long.  File B has 16 million
> observations
> and several million unique values for PATENT.  I want to do some
> manipulation involving File B, but only for the observations that
> correspond to the patent values found in File A.   I am
> currently using
> merge on the two files to do this (actually mmerge as a wrapper for
> ease), but wonder if there is an easier/faster way.
> I attempted using vallist.ado in File A to generate a long
> local macro
> (say, _useme) and then doing
> 	use FileB if index(patent, "'useme'")
> I get 0 observations in this case (even though I know there are some
> matches).  From the manual, it appears that index is
> limited to strings
> of 80 characters, anyway.

-vallist- is Patrick Joly's program.

Quite apart from the 80 characters limit, what it does
does nothing to help with your problem.

Stripping down to a miniature analogue, suppose you have
a string variable -myvar- which takes on distinct values
"a" "b" "c".

-vallist myvar- will return that set of values as a
space-separated list, i.e.

"a b c"

If you then say

... if index(myvar,"a b c")

then this is true for _none_ of the observations;
naturally, you report the same for your dataset.

Closer to your problem are approaches detailed in


which may well be (the equivalent) of what you are doing
with -mmerge-.


*   For searches and help try:

© Copyright 1996–2019 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index