Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Seeking help in stata


From   "Gabi Huiber" <ghuiber@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Seeking help in stata
Date   Fri, 14 Dec 2007 11:26:58 -0500

This solution is not general, in that it assumes that all the codeks
values that do not include A, X, or AX are numeric strings. If that is
the case, Badi could simply do this:

gen x=real(codeks)

and x will show a missing value everywhere codeks shows A, X, or AX.

Then it won't be too hard to say

gen codeks_ax_dummy=(x==.)
drop x

Does this help?

Gabi

On Dec 14, 2007 11:04 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> But that is open to the same comment. Your result from -egen-
> counts how many observations in total satisfy the stated criteria.
> In data cleaning knowing which they are is the key issue,
> at least in my experience.
>
> Anders Alexandersson
>
> Ah, thanks Nick. I forgot in the haste how to create regular sums. I
> meant
>
> egen found = total( regexm(codeks, "A") + regexm(codeks, "X") ) >= 1
>
> Having now read the FAQ for regular expressions at
> http://www.stata.com/support/faqs/data/regex.html
> it seems that regexm() uses the pipe character for logical or, so I
> also suggest this solution:
> gen found = regexm(codeks, ["A" | "X"])
>
>
> Nick Cox <n.j.cox@durham.ac.uk> wrote:
> > Anders' solution makes use of -sum()-. That would cumulate
> > from observation to observation. It sounds to me as if
> > Thaddee wants to look at each observation separately.
> >
> > See also my solution suggested earlier.
> >
> > (Stata had an -index()- function, but from Stata 10 it is available
> > only under version control. -strpos()- is now the equivalent.)
> >
>
> > Thaddee Badibanga <tbadibanga@yahoo.fr> wrote:
> >
> > > I'd like to create an index from a
> > > variable which is a pseudo numeric or a string(numeric
> > > as well character). This index will allow me to
> > > eliminate some observations in the dataset. To give
> > > you an idea, the variable I termed codeks is as
> > > follows:
> > > codeks:101 102 01A 01X 0AX ...103 ... 111 112 ...11111
> > >
> > > I'd like to create an index that assigns 1 if codeks
> > > includes A or X or AX and 0 otherwise. I have done
> > > this in other programs. In one program for instance,
> > > this can be done as:
> > > found=indexc(codeks,"A","X")
> > >
> > > I will really appreciate your help. I have spent more
> > > than 3 hours without success.
> >
> > I am not aware of a similar function in Stata. But the regexm() string
> > function combined with a Boolean expression should work. This FAQ
> > explains Boolean expressions in Stata:
> > http://www.stata.com/support/faqs/data/trueorfalse.html
> > For example, regexm(codeks, "A") would evaluate to 1 if codeks has the
> > string A, and to 0 otherwise.
> >
> > I have not tried the following, but I think it will work as you
> > intended:
> > gen found = sum( regexm(codeks, "A") + regexm(codeks, "X") ) >= 1
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index