Thanks. But I noted in my original post that I saw a solution using -split-, followed by running through the generated variables - more or less along these lines. I was looking for something more elegant. -regexm()- eventually yielded a simple solution. cheers, Jeph Martin Weiss wrote:

I would have recommended http://www.stata-journal.com/article.html?article=dm0039, until I noticed that you are one of the authors... ************* clear* input str20 stringanswer "1:2:3:5:6:7:8:9" "1:2:3:6" "1:2:3:4:5:7:8:9" "1:2:3:5:7:9" "1:2:3:5:7:8:9" "2:3:4:6:9" "1:2:3:5:6:7:8:9" "1:2:7:8:9" "7:9" "1:11:12" end split stringanswer, generate(comp) parse(:) destring, replace egen rowmaxim=rowmax(comp*) su rowmaxim, mean forv i=1/`r(max)'{ egen byte my`i' = anymatch(comp*), values(`i') } drop comp* rowmaxim ************* HTH Martin -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Jeph Herrin Gesendet: Mittwoch, 23. September 2009 17:11 An: statalist@hsphsun2.harvard.edu Betreff: Re: st: -word()- with non space separator THanks. As I note in the paragraph after my data snippet, -strpos()- works as long as there are <=9 values, but doesn't work when I get to multiple digits - strpos("11:12","1") = 1, even though "1" is not really in the list. cheers, J Eric A. Booth wrote:I would use -strpos()-. ****** clear input str20 var1 "1:2:3:5:6:7:8:9" "1:2:3:6" "1:2:3:4:5:7:8:9" "1:2:3:5:7:9" "1:2:3:5:7:8:9" "2:3:4:6:9" "1:2:3:5:6:7:8:9" "1:2:7:8:9" "7:9" end forval n = 1/9 { gen myvar_`n'=. gen ind`n' = strpos(var1, "`n'") replace myvar_`n'=1 if ind`n'>0 drop ind`n' }li var1 myvar_******* Best, Eric __ Eric A. Booth Public Policy Research Institute Texas A&M University ebooth@ppri.tamu.edu Office: +979.845.6754 On Sep 23, 2009, at 9:29 AM, Jeph Herrin wrote:I have a dataset in which many variables are in the most useless format imaginable. If a question has multiple checkboxes as possible answers, the response is stored as a string, with a number indicating each box checked and these numbers separated by colons. Thus: myvar 1:2:3:5:6:7:8:9 1:2:3:6 1:2:3:4:5:7:8:9 1:2:3:5:7:9 1:2:3:5:7:8:9 2:3:4:6:9 1:2:3:5:6:7:8:9 1:2:7:8:9 7:9 This variable takes 9 values, so I want to split into 9 different indicator variables, myvar_1-myvar_9, each indicating whether that number was selected. -split()- does not work, because of the differing number of values per string. That is, it produces myvar_1 which equals "7" for the last obs. So I am looking for a way to check whether a given string contains a given integer, which would allow me to forv i=1/9 { gen byte myvar_`i'= [`i' is in myvar list] } As long as there are just 9 values, I can use -strpos()- to check for the presence of the digit, but some of my variables run into tens and twenties, in which case eg searching for "1" returns true even if there is only "11". The only solutions I see are to first -split()- and then check all the new indicators, or run through a series of checks such as (matches "1:" but not ":1"). I don't like either: Is there a direct way to check to see if a given integer is in the list? I think there may be a regex solution, but my Perl programming days are so far behind me that I've not been able to come up with one. thanks, Jeph

