<> To appreciate the full meaning of the -noccur- code, it makes a lot of sense to rummage through -help whatsnew-, which shows that the -index()- function was renamed to -strpos()- from -version- 8 to 9... HTH Martin -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Nick Cox Gesendet: Donnerstag, 17. September 2009 14:09 An: statalist@hsphsun2.harvard.edu Betreff: st: RE: AW: Count special characters Martin's guess at what Andrea means resembles mine. Let's spell out the low-level logic implemented more generally in -egen, noccur()- (by Nick Winter) and -egen, nss()- within -egenmore- from SSC. By the way, I can't improve on the explanation in -egenmore-'s help: "The inclusion of noccur() and nss(), two almost identical functions, was an act of sheer inadvertence by the maintainer." In the case of a single desired character, the low-level logic is very simple: initialise: count <- 0 loop from the start to the end of a string { look at each character if it's the desired character { count <- count + 1 } } What is nice is that this loop can be extended automatically to all the observations in a variable. Thus in Martin's example, a low-level way to get the counts is gen numberofocc = 0 qui forval j = 1/30 { replace numberofocc = numberofocc + (substr(mystr, `j', 1) == "|") } Here the 30 is large enough to get at the last character in the longest string. This is really _much less_ code than using -egen- because of the code that the call to -egen- implies. Note that local maxposs = real(substr("`: type mystr'", 4, .)) is the maximum possible length of mystr, while gen nchars = length(mystr) su nchars, meanonly local maxact = r(max) returns the maximum actual length. See also the help for -extended_fcn- for how to get this done in a macro. Thus local mystr "SMCL makes cooler logs" local mystr : subinstr local mystr "o" "o", all count(local howmanyo) di `howmanyo' is a way to count "o"s. (Note that there is nothing to stop you changing the "o"s to something else, which is the more characteristic use of this construct. Nor is an explicit loop out of the question either.) Nick n.j.cox@durham.ac.uk Martin Weiss I take "cell" to denote an observation of a -string- variable... Install Nick`s -ssc inst egenmore- and: ************* clear* input str30 mystr "first st|r|" "se|cond str|ing" "third string" "fou||rth| strin|g" end compress egen numberofocc= /* */ noccur(mystr) , string(|) list, noobs ************* Andrea Rispoli is there a command that I can use to count the number of "|" in a cell? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

