Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: RE: AW: Count special characters


From   Andrea Rispoli <andrea.rspl@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: AW: RE: AW: Count special characters
Date   Fri, 18 Sep 2009 00:34:00 +0200

This worked:

gen numberofocc = 0

qui forval j = 1/30 {
       replace numberofocc = numberofocc + (substr(mystr, `j', 1) ==
"|")
}

Thank you very much.

On Thu, Sep 17, 2009 at 2:30 PM, Martin Weiss <martin.weiss1@gmx.de> wrote:
>
> <>
>
>
> To appreciate the full meaning of the -noccur- code, it makes a lot of sense
> to rummage through -help whatsnew-, which shows that the -index()- function
> was renamed to -strpos()- from -version- 8 to 9...
>
>
> HTH
> Martin
>
>
> -----Ursprüngliche Nachricht-----
> Von: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Nick Cox
> Gesendet: Donnerstag, 17. September 2009 14:09
> An: statalist@hsphsun2.harvard.edu
> Betreff: st: RE: AW: Count special characters
>
> Martin's guess at what Andrea means resembles mine. Let's spell out the
> low-level logic implemented more generally in -egen, noccur()- (by Nick
> Winter) and -egen, nss()- within -egenmore- from SSC.
>
> By the way, I can't improve on the explanation in -egenmore-'s help:
>
> "The inclusion of noccur() and nss(), two almost identical functions,
> was an act of sheer inadvertence by the maintainer."
>
> In the case of a single desired character, the low-level logic is very
> simple:
>
> initialise: count <- 0
>
> loop from the start to the end of a string {
>        look at each character
>        if it's the desired character { count <- count + 1 }
> }
>
> What is nice is that this loop can be extended automatically to all the
> observations in a variable.
>
> Thus in Martin's example, a low-level way to get the counts is
>
> gen numberofocc = 0
>
> qui forval j = 1/30 {
>        replace numberofocc = numberofocc + (substr(mystr, `j', 1) ==
> "|")
> }
>
> Here the 30 is large enough to get at the last character in the longest
> string. This is really _much less_ code than using -egen- because of the
> code that the call to -egen- implies.
>
> Note that
>
> local maxposs = real(substr("`: type mystr'", 4, .))
>
> is the maximum possible length of mystr, while
>
> gen nchars = length(mystr)
> su nchars, meanonly
> local maxact = r(max)
>
> returns the maximum actual length. See also the help for -extended_fcn-
> for how to get this done in a macro.
>
> Thus
>
> local mystr "SMCL makes cooler logs"
> local mystr : subinstr local mystr "o" "o", all count(local howmanyo)
> di `howmanyo'
>
> is a way to count "o"s. (Note that there is nothing to stop you changing
> the "o"s to something else, which is the more characteristic use of this
> construct. Nor is an explicit loop out of the question either.)
>
> Nick
> n.j.cox@durham.ac.uk
>
> Martin Weiss
>
> I take "cell" to denote an observation of a -string- variable... Install
> Nick`s -ssc inst egenmore- and:
>
>
> *************
> clear*
>
> input str30 mystr
> "first st|r|"
> "se|cond str|ing"
> "third string"
> "fou||rth|  strin|g"
> end
>
> compress
>
> egen numberofocc= /*
> */  noccur(mystr) , string(|)
>
> list, noobs
> *************
>
> Andrea Rispoli
>
> is there a command that I can use to count the number of "|" in a cell?
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index