Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Test position of a whole word within a macro


From   Sergiy Radyakin <serjradyakin@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: RE: Test position of a whole word within a macro
Date   Tue, 3 Dec 2013 19:57:29 -0500

I might be wrong, but I guess the original poster means:

token("var12 var1 var2 var3", "var2")==>3
(pseudo function) which returns the word number of the "var2" in the
list determined by the first argument.

This is similar to what Jack Newsham was just asking in a neighboring
thread (too new so not yet in the statalist archive to be quoted). But
we knew that in his case all US states were abbreviated to 2 letters,
plus a space separator, so in the code I posted, I was dividing by 3.
In your case it seems you are asking a more generic question than you
need (you are asking about arbitrary words, but your words are
variables, and this is important). Hence you can just construct your
original list to be in 32+1 format:
"var12........spaces to fill 32.... var1........spaces to fill 32. etc"
then use the same approach, but divide not by 3 but by 33.

Obviously constructing the (32+1)-formatted list is easily
automatable, so you don't have to count spaces in the editor. String
functions in Stata 13 should work with really big strings, so this
approach should now work very well.

In general the idea of this approach in both cases is to delegate the
search to strpos() function, which saves you a loop and the need to
compare things. The cost is that you must be able to recover the
intended content by position (result of the strpos()) and uniqueness
(in Jack's case all states were unique and of same length, in yours
you should decorate the search list and target with spaces to match
whole words only).

IMHO Stata is missing the very helpful table() function present e.g.
in GPSS (yes here I mean really GPSS, not SPSS), which allowed to
build indirect references, and even interpolate (as far as I
remember). It is trivial to implement, but not possible, since Stata
does not allow defining user-functions (to be used in Stata-language
expressions). A valid substitute is usually a -recode- statement or a
series of replaces, but it almost always results in multiple
statements where one should be enough.

Best, Sergiy Radyakin


On Tue, Dec 3, 2013 at 5:53 PM, Sarah Edgington <sedging@ucla.edu> wrote:
> Brent,
> Forgive me if I am misunderstanding what you are trying to do but it looks
> like from your initial example that you are trying to count the number of
> words in string.
> If that is in fact what you're trying to do, you might try the function
> wordcount.  So you'd have something like:
> local test=wordcount("var12 var3 var1")
>
> -Sarah
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Brent McSharry
> (ADHB)
> Sent: Tuesday, December 03, 2013 2:36 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: Test position of a whole word within a macro
>
> Dear statalisters
>
> I need to determine the position (in words, not characters) a word is within
> a macro. For example:
>
> local test = *wordposition*("var12 var3 var1")
>
> local test should have a value of 3.
>
> It will be used within a loop which will be called hundreds of times, so it
> must be performat, and for this reason I will only use a foreach loop if
> this is the only way.
>
> The programming (ie not stata) solution would be to create a 'dictionary'/
> hash table of the string values.
>
> The pseudo-code for what I am trying to do () is . matrix `outmat' =
> J(`indepcount', 1,0) . matrix rownames `outmat' = `indepvars'
> . forvalues i=1(1)`iterations' {
> .       //reassign division of development and testing data sets
> .       //build model & exclude unwanted variables
> .       local included:colfullnames(e(b))
> .       //remove _cons from `included'
> .       foreach v in `included' {
> .               //if vectors could be referred to by string, one would use
> .               // `outmat'[`v'] = outmat'[`v'] + 1
> .               //I don't think the above is possible in stata, so instead:
> .               local i = //which word
> .               `outmat'[`i'] = outmat'[`i'] + 1
> .       }
> . }
>
> Does anyone have any ideas on a performant way to test for an exact word
> match? Thank you
>
> Brent McSharry
> Intensivist
> Starship Hospital Auckland
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index