[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Selecting part of a LARGE file |

Date |
Sat, 7 Jun 2003 17:37:29 +0100 |

Glenn Hoetker wrote >> I have two files. File A has about 5000 unique values of >> the variable >> PATENT, which is 7 characters long. File B has 16 million >> observations >> and several million unique values for PATENT. I want to do some >> manipulation involving File B, but only for the observations that >> correspond to the patent values found in File A. I am >> currently using >> merge on the two files to do this (actually mmerge as a wrapper for >> ease), but wonder if there is an easier/faster way. >> >> I attempted using vallist.ado in File A to generate a long >> local macro >> (say, _useme) and then doing >> >> use FileB if index(patent, "'useme'") >> >> I get 0 observations in this case (even though I know there are some >> matches). From the manual, it appears that index is >> limited to strings >> of 80 characters, anyway. and I replied >-vallist- is Patrick Joly's program. > >Quite apart from the 80 characters limit, what it does >does nothing to help with your problem. > >Stripping down to a miniature analogue, suppose you have >a string variable -myvar- which takes on distinct values >"a" "b" "c". > >-vallist myvar- will return that set of values as a >space-separated list, i.e. > >"a b c" > >If you then say > >... if index(myvar,"a b c") > >then this is true for _none_ of the observations; >naturally, you report the same for your dataset. < snip> David Kantor commented > > Putting that aside, and putting aside the 80-character > limitation, the > reason that > use FileB if index(patent, "'useme'") > gets no matches at all (when you do expect some) is that it > should be... > use FileB if index("'useme'", patent) > > -- the arguments are reversed. > > Nick Cox replied that you should expect no matches; he > didn't say why. David adds an important detail to explaining what went wrong. Let me fill in the gap I apparently left. Let's recap on what -index()- does, for string expressions s1 and s2: -index(s1,s2)- returns the position in s1 at which s2 is first found or 0 if s1 does not contain s2. David's example helps underline that this is a function in which the order of arguments does, typically, matter. In my toy example with values "a" "b" "c" -vallist- would give the composite list "a b c", and index("a b c", "a") index("a b c", "b") index("a b c", "c") as particular instances of -index("a b c", myvar)- are all non-zero (or treated as true). Conversely, index("a", "a b c") index("b", "a b c") index("c", "a b c") as particular instances of -index(myvar, "a b c") are all zero (or treated as false), as in no case is the composite string ever contained in any of the elements. As Glenn noted, the 80 character limit stops -index("<value list>", varname)- being a practical method for all but restricted problems. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Selecting part of a LARGE file***From:*David Kantor <dkantor@jhu.edu>

- Prev by Date:
**st: RE: stata 1.0** - Next by Date:
**Re: st: RE: stepwise regression with force statment** - Previous by thread:
**Re: st: Selecting part of a LARGE file** - Next by thread:
**st: stepwise regression with force statment** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |