Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Extracting substrings from variable and combining variables.


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Extracting substrings from variable and combining variables.
Date   Mon, 4 Jun 2012 10:20:38 +0100

Previously I wrote

" I don't know exactly what you want, so that rules out further
suggestions from me for the time being. You would get better help by
giving examples of what the variables you want would look like."

You've not done this. All that I can pick up here is that you want to
combine variables. I don't know what that "combining" means. So, this
is another (but final) attempt from me to help.

Note that -regexm()- and -regexs()- are functions, not commands. This
is not just a piece of pedantry as (1) referring to functions as
commands may confuse at least some readers, and clarifies nothing (2)
thinking of these, always, as functions helps reminds everyone that
they are defined and documented distinctly.

It seems that you have variables -mdiag1-mdiag8- and wish to extract
diagnoses "O1", "637", "642". You expect those diagnoses to be leading
substrings.  You can create a new composite variable this way.

gen anydiag = ""

foreach diag in O1 637 642 {
         forval j = 1/8 {
                   local len = length("`diag'")
                   replace anydiag = anydiag + "`diag'" if
substr(mdiag`j', 1, `len') == "`diag'"
        }
}

But we've already gone over similar ideas in this thread. I don't
think you ever said why you can't work from that resulting composite
variable.

You can create new indicator variables this way

gen hasO1 = 0
gen has637 = 0
gen has642 = 0

forval j = 1/8 {
         replace hasO1 = 1 if hasO1 == 0 & substr(mdiag`j', 1, 2) == "O1"
         replace has637 = 1 if has637 == 0 & substr(mdiag`j', 1, 3) == "637"
         replace has642 = 1 if has642 == 0 & substr(mdiag`j', 1, 3) == "642"
}

This can be done with regex machinery too as a matter of taste.

Nick

On Mon, Jun 4, 2012 at 9:42 AM, Amal Khanolkar <Amal.Khanolkar@ki.se> wrote:

> Originally, I started using the 'regex' command to extract ICD codes from my variables of interest shown below (mdiag1, mdiag2, mdiag3, mdiag4 etc....). I'm extracting the same ICD codes from all the mdiag variables starting with the numbers/letters: 637, 642 and O1. Initially I extracted the ICD codes from each mdiag variable separately with the idea of combining them at the end. But that seems a bit more complicated now. Maybe, one solution could be to extract all ICD codes from all mdiag variables at the same time. There are 12 such mdiag variables.
>
> gen preght1 = regexs(0) if regexm(mdiag1, "^(637|642|O1)")
>                        tab preght1
>
>                        gen preght2 = regexs(0) if regexm(mdiag2, "^(637|642|O1)")
>                        tab preght2
>
>                        gen preght3 = regexs(0) if regexm(mdiag3, "^(637|642|O1)")
>                        tab preght3
>
>                        gen preght4 = regexs(0) if regexm(mdiag4, "^(637|642|O1)")
>                        tab preght4
>
>                        gen preght5 = regexs(0) if regexm(mdiag5, "^(637|642|O1)")
>                        tab preght5
>
>                        gen preght6 = regexs(0) if regexm(mdiag6, "^(637|642|O1)")
>                        tab preght6
>
>                        gen preght7 = regexs(0) if regexm(mdiag7, "^(637|642|O1)")
>                        tab preght7
>
>                        gen preght8 = regexs(0) if regexm(mdiag8, "^(637|642|O1)")
>                        tab preght8
>
> The above generates 8 preght variables and works great.
>
> Initially I tried to combine the (mdiagX, "^(637|642|O1) for each mdiag variable by enclosing them in separate brackets one after another. But it doesn't work. How do I modify the regexs/regexm commands to be able to tell Stata to pluck out the ICD codes for several variables in the same command line?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index