Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: AW: RE: AW: Count special characters


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: AW: RE: AW: Count special characters
Date   Thu, 17 Sep 2009 14:30:46 +0200

<> 


To appreciate the full meaning of the -noccur- code, it makes a lot of sense
to rummage through -help whatsnew-, which shows that the -index()- function
was renamed to -strpos()- from -version- 8 to 9... 


HTH
Martin


-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Nick Cox
Gesendet: Donnerstag, 17. September 2009 14:09
An: statalist@hsphsun2.harvard.edu
Betreff: st: RE: AW: Count special characters

Martin's guess at what Andrea means resembles mine. Let's spell out the
low-level logic implemented more generally in -egen, noccur()- (by Nick
Winter) and -egen, nss()- within -egenmore- from SSC. 

By the way, I can't improve on the explanation in -egenmore-'s help:

"The inclusion of noccur() and nss(), two almost identical functions,
was an act of sheer inadvertence by the maintainer."

In the case of a single desired character, the low-level logic is very
simple: 

initialise: count <- 0 

loop from the start to the end of a string { 
	look at each character 
	if it's the desired character { count <- count + 1 } 
}

What is nice is that this loop can be extended automatically to all the
observations in a variable. 

Thus in Martin's example, a low-level way to get the counts is 

gen numberofocc = 0 

qui forval j = 1/30 { 
	replace numberofocc = numberofocc + (substr(mystr, `j', 1) ==
"|") 
}

Here the 30 is large enough to get at the last character in the longest
string. This is really _much less_ code than using -egen- because of the
code that the call to -egen- implies.

Note that 

local maxposs = real(substr("`: type mystr'", 4, .))

is the maximum possible length of mystr, while 

gen nchars = length(mystr)
su nchars, meanonly 
local maxact = r(max) 
	
returns the maximum actual length. See also the help for -extended_fcn-
for how to get this done in a macro. 

Thus 

local mystr "SMCL makes cooler logs" 
local mystr : subinstr local mystr "o" "o", all count(local howmanyo) 
di `howmanyo'

is a way to count "o"s. (Note that there is nothing to stop you changing
the "o"s to something else, which is the more characteristic use of this
construct. Nor is an explicit loop out of the question either.) 

Nick 
n.j.cox@durham.ac.uk 

Martin Weiss

I take "cell" to denote an observation of a -string- variable... Install
Nick`s -ssc inst egenmore- and:


*************
clear*

input str30 mystr
"first st|r|"
"se|cond str|ing"
"third string"
"fou||rth|  strin|g"
end

compress

egen numberofocc= /* 
*/  noccur(mystr) , string(|) 

list, noobs
*************

Andrea Rispoli

is there a command that I can use to count the number of "|" in a cell?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index