[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

# st: RE: AW: Count special characters

 From "Nick Cox" To Subject st: RE: AW: Count special characters Date Thu, 17 Sep 2009 13:09:00 +0100

```Martin's guess at what Andrea means resembles mine. Let's spell out the
low-level logic implemented more generally in -egen, noccur()- (by Nick
Winter) and -egen, nss()- within -egenmore- from SSC.

By the way, I can't improve on the explanation in -egenmore-'s help:

"The inclusion of noccur() and nss(), two almost identical functions,
was an act of sheer inadvertence by the maintainer."

In the case of a single desired character, the low-level logic is very
simple:

initialise: count <- 0

loop from the start to the end of a string {
look at each character
if it's the desired character { count <- count + 1 }
}

What is nice is that this loop can be extended automatically to all the
observations in a variable.

Thus in Martin's example, a low-level way to get the counts is

gen numberofocc = 0

qui forval j = 1/30 {
replace numberofocc = numberofocc + (substr(mystr, `j', 1) ==
"|")
}

Here the 30 is large enough to get at the last character in the longest
string. This is really _much less_ code than using -egen- because of the
code that the call to -egen- implies.

Note that

local maxposs = real(substr("`: type mystr'", 4, .))

is the maximum possible length of mystr, while

gen nchars = length(mystr)
su nchars, meanonly
local maxact = r(max)

returns the maximum actual length. See also the help for -extended_fcn-
for how to get this done in a macro.

Thus

local mystr "SMCL makes cooler logs"
local mystr : subinstr local mystr "o" "o", all count(local howmanyo)
di `howmanyo'

is a way to count "o"s. (Note that there is nothing to stop you changing
the "o"s to something else, which is the more characteristic use of this
construct. Nor is an explicit loop out of the question either.)

Nick
n.j.cox@durham.ac.uk

Martin Weiss

I take "cell" to denote an observation of a -string- variable... Install
Nick`s -ssc inst egenmore- and:

*************
clear*

input str30 mystr
"first st|r|"
"se|cond str|ing"
"third string"
"fou||rth|  strin|g"
end

compress

egen numberofocc= /*
*/  noccur(mystr) , string(|)

list, noobs
*************

Andrea Rispoli

is there a command that I can use to count the number of "|" in a cell?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

 © Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index