Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: extracting portions of a string variable using observations from another variable


From   Eric Booth <ebooth@ppri.tamu.edu>
To   "<statalist@hsphsun2.harvard.edu>" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: extracting portions of a string variable using observations from another variable
Date   Tue, 25 Jan 2011 02:21:24 +0000

<>


Here are 2 approaches:

The first one is less reliable (i.e., it might require careful examination and tweaking) but might be more useful if you are bringing over more variables from the 'dispersingsform'/using dataset to the 'records'/master dataset.  Keep in mind that it matches on the first word (parsed by a space character) in 'dispersingsform' -- so it matches "filmovertrukne tabl" by the "filmovertrukne" part.

The second approach is more straightforward if you are working with a  list of 'dispersingsform' that is short enough to fit into a macro (see help limits) and you don't need to bring in any extra variables from the 'dispersingsform' dataset.  It simply collects all the dispersingsform into a local macro (`alt') and then uses a string position function (see help string_functions) to find matches.

The result of both approaches are stored in the variable 'newvar':


**********************************!  Begin Example
** Note: Watch for Wrapping **

//DATASET OF WORDS TO BE EXTRACTED FROM RECORDS DATA-->
clear
 inp str30 Dispenseringsform
"filmovertrukne tabl."
"oral opløsning"
"pulv.t.konc.t.inf.v."
"inj.-/inf.væske"
"enterotabletter"
"tabletter"
"pulv.t.inj.væske,opl"
"inf.væske, opløsning"
"pul.t.inj.+inf.,opl."
end
levelsof Dispenseringsform, loc(alt)
di `"`alt'"'
split Dispenseringsform
l Dispenseringsform1
sa dispense.dta, replace


//1. EXTRACT WORDS IN DISPENSE.DTA USING MERGE -->
clear
inp str244 record
"Cefuroxim Stragen pulv.t.inj.væske,opl 7,5 mg/ml intravenøst 100 ml kl 00:00 +  100 ml kl 08:00 +  100 ml kl 16:00 ;(xxx yyy (Overl‘ge) aaa12 09-09-2010 00:35)"
"Metronidazol Actavis filmovertrukne tabl. 500  mg peroralt 1 tablet 3 gang(e) Daglig ;(xxx yyy-zzz (Stud. med.) aaa1bb 19-08-2010 01:20)"
"Metronidazol B. Braun inf.væske, opløsning 5  mg/ml intraveøst 100 ml 3 gang(e) Daglig ;(xxx yyy (Reservel‘ge) aaa2bb 29-09-2010 01:21)"
"Nexium pul.t.inj.+inf.,opl. 0,4 mg/ml intravenøst 100 ml 1 gang(e) Daglig ;(xxx yyy (Overl‘ge) aaa12 27-10-2010 01:37)"
end
sa records.dta, replace

split record
l record2-record6

/* 
extract words in 'record' that 
match dispense.dta list:  (
pulv.t.inj.væske,opl,  
filmovertrukne tabl.  inf.væske, 
opløsning and  pul.t.inj.+inf.,opl.)
*/

g str30 newvar = ""
forval n = 2/5 {
	rename record`n' Dispenseringsform1
	merge m:1 Dispenseringsform1 using "dispense.dta"
	drop if _m==2 //--keep matched and master records only
	replace newvar = Dispenseringsform if _m==3 & mi(newvar)
	drop _merge  Dispenseringsform2  Dispenseringsform
	rename Dispenseringsform1 record`n'
	}
order newvar
drop record? record??
l


//2.  MATCH USING STRING FUNCTIONS-->

u records.dta, clear
g str30 newvar = ""
foreach x in `alt' {
	tempvar pos
	g `pos' = strpos(record, `"`x'"')
	replace newvar = "`x'" if `pos'>0 ///
	& mi(newvar)
		}
order newvar
l newvar record


**********************************!
** Note: Watch for Wrapping **


- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
ebooth@ppri.tamu.edu
Office: +979.845.6754



On Jan 24, 2011, at 3:21 PM, Daniel Henriksen wrote:

> Hello statalist
> 
> Hope you can help me. Is it possible for stata to extract specific
> words within a string using observations from another variable?
> I have a dataset with a list different ways of dispensing the drug
> (which form it is). here's an example:
> 
> Dispenseringsform
> filmovertrukne tabl.
> oral opløsning
> pulv.t.konc.t.inf.v.
> inj.-/inf.væske
> enterotabletter
> tabletter
> pulv.t.inj.væske,opl
> inf.væske, opløsning
> pul.t.inj.+inf.,opl.
> (I have 270 rows of these (different forms and different ways of spelling it))
> 
> the I have another dataset (only one variable but many observations)
> containing information on what drug, way of dispensing, dose and time
> the drug is to be administered to the patient:
> 
> Cefuroxim Stragen pulv.t.inj.væske,opl 7,5 mg/ml intravenøst 100 ml kl
> 00:00 +  100 ml kl 08:00 +  100 ml kl 16:00 ;(xxx yyy (Overl‘ge) aaa12
> 09-09-2010 00:35)
> Metronidazol Actavis filmovertrukne tabl. 500  mg peroralt 1 tablet 3
> gang(e) Daglig ;(xxx yyy-zzz (Stud. med.) aaa1bb 19-08-2010 01:20)
> Metronidazol B. Braun inf.væske, opløsning 5  mg/ml intraveøst 100 ml
> 3 gang(e) Daglig ;(xxx yyy (Reservel‘ge) aaa2bb 29-09-2010 01:21)
> Nexium pul.t.inj.+inf.,opl. 0,4 mg/ml intravenøst 100 ml 1 gang(e)
> Daglig ;(xxx yyy (Overl‘ge) aaa12 27-10-2010 01:37)
> 
> So I would like to extract  pulv.t.inj.væske,opl,  filmovertrukne
> tabl.  inf.væske, opløsning and  pul.t.inj.+inf.,opl. from these four
> observations and place them in a new variable without having to go
> through all of the information manually.
> I hope my question is clear.
> 
> Thank you for your time
> Daniel
> 
> --
> Daniel Henriksen
> Ph.d. studerende, læge
> Infektionsmedicinsk afd Q / Akut Modtage Afdelingen
> Odense Universitetshospital
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index