Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Extracting substrings from variable and combining variables.


From   Amal Khanolkar <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: Extracting substrings from variable and combining variables.
Date   Fri, 1 Jun 2012 08:11:44 +0000

Hi again,

I tried to combine 12 such variables (examples of three below) to form one variable with the same 3 categories.


tab preght1

    preght1 |      Freq.     Percent        Cum.
------------+-----------------------------------
        637 |      8,314       20.76       20.76
        642 |     21,268       53.11       73.88
         O1 |     10,461       26.12      100.00
------------+-----------------------------------
      Total |     40,043      100.00


.                         tab preght2

    preght2 |      Freq.     Percent        Cum.
------------+-----------------------------------
        637 |     11,202       33.51       33.51
        642 |     15,191       45.44       78.95
         O1 |      7,036       21.05      100.00
------------+-----------------------------------
      Total |     33,429      100.00



.                         tab preght4

    preght4 |      Freq.     Percent        Cum.
------------+-----------------------------------
        637 |        797       18.02       18.02
        642 |      1,747       39.51       57.53
         O1 |      1,878       42.47      100.00
------------+-----------------------------------
      Total |      4,422      100.00

When I add-up the 12 preght variables, I get a total of 90930 observations that should have my diagnosis of interest. However when using the egn as below I get only 88228!

This what I get when I run 'egen with the concat' function:

egen preght=concat(preght1 preght2 preght3 preght4 preght5 preght6 preght7 preght8 preght9 preght10 preght11 preght12)
(2903228 missing values generated)


      preght |      Freq.     Percent        Cum.
-------------+-----------------------------------
         637 |     20,922       23.71       23.71
      637637 |        960        1.09       24.80
   637637637 |        104        0.12       24.92
637637637637 |          3        0.00       24.92
      637642 |          2        0.00       24.93
         642 |     42,108       47.73       72.65
      642637 |          1        0.00       72.65
      642642 |        748        0.85       73.50
   642642642 |          7        0.01       73.51
          O1 |     22,634       25.65       99.16
        O1O1 |        720        0.82       99.98
      O1O1O1 |         17        0.02      100.00
    O1O1O1O1 |          2        0.00      100.00
-------------+-----------------------------------
       Total |     88,228      100.00


Thnaks!

Best regards,

Amal Khanolkar, PhD candidate,
________________________________________
From: [email protected] [[email protected]] on behalf of Nick Cox [[email protected]]
Sent: 31 May 2012 17:34
To: '[email protected]'
Subject: RE: st: Extracting substrings from variable and combining variables.

-egen, concat()- "didn't work": this can not be discussed without reference to exactly (a) what you want to do, (b) what you tried and (c) what happened.

Nick
[email protected]

Amal Khanolkar

Hi Nick & Brendan,

Thanks so much for your help with the 'regex' commands in retrieving subjects with a common diagnosis from my dataset.

I know have 12 such 'diagnostic' variables (preght1-12) all for say hypertension ( 12, as a patient might have received this diagnosis as the 1st or 7th or 12th diagnosis when admitted to hospital).

I need to combine these 12 variables into one. I tried doing this using the 'egen' command with the concat function but it didn't work. Any tips on other commands I could try?

The variables look like this and most of the 12 variables have the same 3 categories, but some have just 2 or 1:

                         tab preght1

    preght1 |      Freq.     Percent        Cum.
------------+-----------------------------------
        637 |      8,314       20.76       20.76
        642 |     21,268       53.11       73.88
         O1 |     10,461       26.12      100.00
------------+-----------------------------------
      Total |     40,043      100.00


.                         tab preght2

    preght2 |      Freq.     Percent        Cum.
------------+-----------------------------------
        637 |     11,202       33.51       33.51
        642 |     15,191       45.44       78.95
         O1 |      7,036       21.05      100.00
------------+-----------------------------------
      Total |     33,429      100.00



.                         tab preght4

    preght4 |      Freq.     Percent        Cum.
------------+-----------------------------------
        637 |        797       18.02       18.02
        642 |      1,747       39.51       57.53
         O1 |      1,878       42.47      100.00
------------+-----------------------------------
      Total |      4,422      100.00



. des  preght1

              storage  display     value
variable name   type   format      label      variable label
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
preght1         str3   %9s



Thanks,

/Amal.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
From: [email protected] [[email protected]] on behalf of Nick Cox [[email protected]]
Sent: 25 May 2012 20:22
To: [email protected]
Subject: Re: st: Extracting substrings from variables.

As any leading spaces surely don't matter, consider using

regexm(ltrim(mdiag1x), "^(637|642|O1)")

Nick

On Fri, May 25, 2012 at 7:17 PM, Brendan Halpin <[email protected]> wrote:
> On Fri, May 25 2012, Nick Cox wrote:
>
>> . di regexm("Stata rules OK O1", "^637|642|O1")
>> 1
>
> OK, I was wrong that the grouping parentheses were unnecessary. However,
> the way I used them first was also wrong.
>
> Something like this is needed:
>
> . gen pright = regexs(0) if regexm(mdiag1x, "^(637|642|O1)")
>
> More evidence that Nick's reluctance about regexp is not unwise.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index