Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Seeking help in stata


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Seeking help in stata
Date   Fri, 14 Dec 2007 16:04:42 -0000

But that is open to the same comment. Your result from -egen- 
counts how many observations in total satisfy the stated criteria. 
In data cleaning knowing which they are is the key issue, 
at least in my experience. 

Anders Alexandersson

Ah, thanks Nick. I forgot in the haste how to create regular sums. I
meant

egen found = total( regexm(codeks, "A") + regexm(codeks, "X") ) >= 1

Having now read the FAQ for regular expressions at
http://www.stata.com/support/faqs/data/regex.html
it seems that regexm() uses the pipe character for logical or, so I
also suggest this solution:
gen found = regexm(codeks, ["A" | "X"])

Nick Cox <n.j.cox@durham.ac.uk> wrote:
> Anders' solution makes use of -sum()-. That would cumulate
> from observation to observation. It sounds to me as if
> Thaddee wants to look at each observation separately.
>
> See also my solution suggested earlier.
>
> (Stata had an -index()- function, but from Stata 10 it is available
> only under version control. -strpos()- is now the equivalent.)
>

> Thaddee Badibanga <tbadibanga@yahoo.fr> wrote:
>
> > I'd like to create an index from a
> > variable which is a pseudo numeric or a string(numeric
> > as well character). This index will allow me to
> > eliminate some observations in the dataset. To give
> > you an idea, the variable I termed codeks is as
> > follows:
> > codeks:101 102 01A 01X 0AX ...103 ... 111 112 ...11111
> >
> > I'd like to create an index that assigns 1 if codeks
> > includes A or X or AX and 0 otherwise. I have done
> > this in other programs. In one program for instance,
> > this can be done as:
> > found=indexc(codeks,"A","X")
> >
> > I will really appreciate your help. I have spent more
> > than 3 hours without success.
>
> I am not aware of a similar function in Stata. But the regexm() string
> function combined with a Boolean expression should work. This FAQ
> explains Boolean expressions in Stata:
> http://www.stata.com/support/faqs/data/trueorfalse.html
> For example, regexm(codeks, "A") would evaluate to 1 if codeks has the
> string A, and to 0 otherwise.
>
> I have not tried the following, but I think it will work as you
> intended:
> gen found = sum( regexm(codeks, "A") + regexm(codeks, "X") ) >= 1
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index