Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: extracting portions of a string variable using observations from another variable


From   Daniel Henriksen <[email protected]>
To   [email protected]
Subject   Re: st: extracting portions of a string variable using observations from another variable
Date   Wed, 26 Jan 2011 21:52:30 +0100

I had some technical problems with my mail, sorry.
Thank you very much Eric for your new solution. I need to sit down and
read it through carefully. My hope is that I can use this example in
general (and it looks like I can). Because I'd like to match the the
drug names as well (cefuroxim, metronidazol ect). I have about 3000
combinations of drug names (some consist of one word others two or
three words) and ways of administer them.

Again, thank you very much! There're a lot of nice people around here!

Cheers
Daniel

2011/1/26 Eric Booth <[email protected]>:
> <>
>
>
> Daniel asked about matching more than one word in the first example using -merge- to match the data.
> One way would be to just create a dataset for each word of the split 'dispersingsform' and match each one in during the loop.  Below, I've modified the first example I provided to do what he asks (edits are marked with *comments*):
>
> **********************************!  Begin Example
> ** Note: Watch for Wrapping **
>
> //DATASET OF WORDS TO BE EXTRACTED FROM RECORDS DATA-->
> clear
> inp str30 Dispenseringsform
> "filmovertrukne tabl."
> "oral opløsning"
> "pulv.t.konc.t.inf.v."
> "inj.-/inf.væske"
> "enterotabletter"
> "tabletter"
> "pulv.t.inj.væske,opl"
> "inf.væske, opløsning"
> "pul.t.inj.+inf.,opl."
> end
> levelsof Dispenseringsform, loc(alt)
> di `"`alt'"'
> split Dispenseringsform
> l Dispenseringsform1
>
>        **new**
>        **new**
>        preserve
>        keep Dispenseringsform1
>        duplicates drop   //--so that you can m:1 merge later
>        sa dispense_Dispenseringsform1.dta, replace
>        restore
>        preserve
>        keep Dispenseringsform2
>        duplicates drop
>        sa dispense_Dispenseringsform2.dta, replace
>        restore
>
>
> //1. EXTRACT WORDS IN DISPENSE.DTA USING MERGE -->
> clear
> inp str244 record
> "Cefuroxim Stragen pulv.t.inj.væske,opl 7,5 mg/ml intravenøst 100 ml kl 00:00 +  100 ml kl 08:00 +  100 ml kl 16:00 ;(xxx yyy (Overl‘ge) aaa12 09-09-2010 00:35)"
> "Metronidazol Actavis filmovertrukne tabl. 500  mg peroralt 1 tablet 3 gang(e) Daglig ;(xxx yyy-zzz (Stud. med.) aaa1bb 19-08-2010 01:20)"
> "Metronidazol B. Braun inf.væske, opløsning 5  mg/ml intraveøst 100 ml 3 gang(e) Daglig ;(xxx yyy (Reservel‘ge) aaa2bb 29-09-2010 01:21)"
> "Nexium pul.t.inj.+inf.,opl. 0,4 mg/ml intravenøst 100 ml 1 gang(e) Daglig ;(xxx yyy (Overl‘ge) aaa12 27-10-2010 01:37)"
> end
> sa records.dta, replace
>
> split record
> l record2-record6
>
> /*
> extract words in 'record' that
> match dispense.dta list:  (
> pulv.t.inj.væske,opl,
> filmovertrukne tabl.  inf.væske,
> opløsning and  pul.t.inj.+inf.,opl.)
> */
>
> g str30 newvar = ""
>
>
>        **updated**
>        **updated**
> forval n = 2/5 {
>        **Adding the -foreach- below allows you to merge over more
>        ****than one word split from 'dispersingsform' in the master data
> foreach new in Dispenseringsform1 Dispenseringsform2 {
>
>        rename record`n' `new'
>        merge m:1 `new' using "dispense_`new'.dta"
>        drop if _m==2 //--keep matched and master records only
>        replace newvar = `new' if _m==3 & mi(newvar)
>        rename `new'  record`n'  // --*I reordered the drop/rename lines
>        drop _merge
>        cap drop Dispenseringsfor*
>                }
>        }
> order newvar
> drop record? record??
> l
> **********************************!  End Example
> Also, keep in mind that you can match on all the words using the second example I provided.
>
> - Eric
> __
> Eric A. Booth
> Public Policy Research Institute
> Texas A&M University
> [email protected]
>
> On Jan 26, 2011, at 9:27 AM, Steven Samuels wrote:
>
>> Daniel, for the  edification of all users (including Eric) who might not remember your original question and his response, please include edited versions in follow-up questions.  (FAQ 3.4 "Edit Previous Posting").
>>
>> Steve
>> [email protected]
>>
>>
>>
>> On Jan 26, 2011, at 9:33 AM, Daniel Henriksen wrote:
>>
>> Dear Eric
>>
>> thank you so much for your suggestions! I will dig further into them asap.
>>
>> regarding your first suggestion, is it possible to match two or three
>> words and  not just the one parsed. excuse my ignorance. still a
>> beginner when it comes to stata
>>
>> cheers
>> daniel
>
>
>
>> Eric A. Booth wrote:
>>
>> <>
>>
>>
>> Here are 2 approaches:
>>
>> The first one is less reliable (i.e., it might require careful examination and tweaking) but might be more useful if you are bringing over more variables from the 'dispersingsform'/using dataset to the 'records'/master dataset. Keep in mind that it matches on the first word (parsed by a space character) in 'dispersingsform' -- so it matches "filmovertrukne tabl" by the "filmovertrukne" part.
>>
>> The second approach is more straightforward if you are working with a  list of 'dispersingsform' that is short enough to fit into a macro (see help limits) and you don't need to bring in any extra variables from the 'dispersingsform' dataset.  It simply collects all the dispersingsform into a local macro (`alt') and then uses a string position function (see help string_functions) to find matches.
>>
>> The result of both approaches are stored in the variable 'newvar':
>>
>> <snip>
>
>
>> On Jan 24, 2011, at 3:21 PM, Daniel Henriksen wrote:
>>
>>> Hello statalist
>>>
>>> Hope you can help me. Is it possible for stata to extract specific
>>> words within a string using observations from another variable?
>>> I have a dataset with a list different ways of dispensing the drug
>>> (which form it is). here's an example:
>>>
>>> Dispenseringsform
>>> filmovertrukne tabl.
>>> oral opløsning
>>> pulv.t.konc.t.inf.v.
>>> inj.-/inf.væske
>>> enterotabletter
>>> tabletter
>>> pulv.t.inj.væske,opl
>>> inf.væske, opløsning
>>> pul.t.inj.+inf.,opl.
>>> (I have 270 rows of these (different forms and different ways of spelling it))
>>>
>>> the I have another dataset (only one variable but many observations)
>>> containing information on what drug, way of dispensing, dose and time
>>> the drug is to be administered to the patient:
>>>
>>> Cefuroxim Stragen pulv.t.inj.væske,opl 7,5 mg/ml intravenøst 100 ml kl
>>> 00:00 +  100 ml kl 08:00 +  100 ml kl 16:00 ;(xxx yyy (Overl‘ge) aaa12
>>> 09-09-2010 00:35)
>>> Metronidazol Actavis filmovertrukne tabl. 500  mg peroralt 1 tablet 3
>>> gang(e) Daglig ;(xxx yyy-zzz (Stud. med.) aaa1bb 19-08-2010 01:20)
>>> Metronidazol B. Braun inf.væske, opløsning 5  mg/ml intraveøst 100 ml
>>> 3 gang(e) Daglig ;(xxx yyy (Reservel‘ge) aaa2bb 29-09-2010 01:21)
>>> Nexium pul.t.inj.+inf.,opl. 0,4 mg/ml intravenøst 100 ml 1 gang(e)
>>> Daglig ;(xxx yyy (Overl‘ge) aaa12 27-10-2010 01:37)
>>>
>>> So I would like to extract  pulv.t.inj.væske,opl,  filmovertrukne
>>> tabl.  inf.væske, opløsning and  pul.t.inj.+inf.,opl. from these four
>>> observations and place them in a new variable without having to go
>>> through all of the information manually.
>>> I hope my question is clear.
>>>
>>> Thank you for your time
>>> Daniel
>>>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Daniel Henriksen
Ph.d. studerende, læge
Infektionsmedicinsk afd Q / Akut Modtage Afdelingen
Odense Universitetshospital
Bygning 2, 1. sal
Sdr. Boulevard 29
5000 Odense C

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index