Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Working with complex strings


From   "Dudekula, Anwar" <dudekulaan@upmc.edu>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Working with complex strings
Date   Wed, 30 Nov 2011 08:53:37 +0000

Hi Nick and Steve , 

Thank you very much for all your help.

Sincerely 
Anwar 

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Wednesday, November 30, 2011 3:37 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Working with complex strings

-split- by default parses on spaces, which clearly is no good here
given that medications can have compound names and dosages will not be
discarded. Steve was evidently pointing to the -parse()- option, not
suggesting that parsing on spaces was the answer.

If we assume that (a) dose always starts with a number and (b) dose
when specified always follows name of medication and (c) names never
have numeric characters, then -split- can be used to parse on numeric
characters. Here I used 1-9 but 0 should be added if it's ever the
first numeric digit:

. split medication, parse(1 2 3 4 5 6 7 8 9) limit(1)
variable created as string:
medication1

. replace medication1 = trim(medication1)
(5 real changes made)

. l

     +---------------------------------------------------+
     |                   medication          medication1 |
     |---------------------------------------------------|
  1. |       metoprolol 100 mg qday           metoprolol |
  2. | metoprolol tatrate 150mg bid   metoprolol tatrate |
  3. |         atenelol 150 mg qday             atenelol |
  4. |              hctz 25 mg qday                 hctz |
  5. |               PEG interferon       PEG interferon |
     |---------------------------------------------------|
  6. |            cimzia 50 mg qday               cimzia |
     +---------------------------------------------------+

Another approach is to use -moss- (SSC):

. moss medication, match("(.+) [1-9]+") regex

. drop _count _pos1

. rename _match1 medication2

With this regular expression, -moss- misses names without dosages,
which can just be copied across.

. replace medication2 = medication if missing(medication2)
(1 real change made)

. l

     +------------------------------------------------------------------------+
     |                   medication          medication1          medication2 |
     |------------------------------------------------------------------------|
  1. |       metoprolol 100 mg qday           metoprolol           metoprolol |
  2. | metoprolol tatrate 150mg bid   metoprolol tatrate   metoprolol tatrate |
  3. |         atenelol 150 mg qday             atenelol             atenelol |
  4. |              hctz 25 mg qday                 hctz                 hctz |
  5. |               PEG interferon       PEG interferon       PEG interferon |
     |------------------------------------------------------------------------|
  6. |            cimzia 50 mg qday               cimzia               cimzia |
     +------------------------------------------------------------------------+

Nick

On Wed, Nov 30, 2011 at 5:43 AM, Dudekula, Anwar <dudekulaan@upmc.edu> wrote:
> Thank you very much
>
> I will work on it .Would the parse() option split metoprolol tatrate 150mg bid as
>
> metoprolol tatrate and 150mg bid
>
> Or
>
> metoprolol & tatrate & 150mg &  bid
>
> Thank you
> Anwar
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Steve Nakoneshny
> Sent: Wednesday, November 30, 2011 12:38 AM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Working with complex strings
>
> - help split - would have answered this question.
>
> - split medication, parse( ) -
>
> should do what you want.


 On Nov 29, 2011, at 9:54 PM, "Dudekula, Anwar" <dudekulaan@upmc.edu> wrote:

>> I am working with deidentified hospitaldatabase with patient names(as string variable) and medications (as string variable)as follows
>>
>> Patients_name        medication
>> ------------------------------------
>> Patient-1            metoprolol 100 mg qday
>> Patient-1            metoprolol tatrate 150mg bid
>> Patient-1            atenelol 150 mg qday
>> Patient-2            hctz 25 mg qday
>> Patient-2            PEG interferon
>> Patient-3            cimzia 50 mg qday
>>
>> Question: I am interested in name of medication only , not their dosages.Is it possible to split  the medication string  after the name  i.e.,
>>
>> 1) split  metoprolol tatrate 150mg bid into  metoprolol tatrate  &  150mg bid
>> 2) split  metoprolol 100 mg qday into   metoprolol   &   100 mg qday
>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index