Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: substringing long, varying length text variables into individual variables


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: substringing long, varying length text variables into individual variables
Date   Thu, 3 Apr 2008 18:07:29 +0100

The -split- command is available in Stata 8 up. 
An earlier version for Stata 7 is on SSC. 
An earlier -strparse- for Stata 6 is on SSC. 

Nick
n.j.cox@durham.ac.uk 

Svend Juul replied to Todd Wagner

Todd wrote:

I have data from a publicly available database (clinicaltrials.gov). 
This database has a number of text variables that I want to break 
into individual variables and I could use some help.

For example, one of the variables is called study designs. Here are 
some data from the study designs variable

Treatment|Randomized|Double-Blind|Placebo Control|Parallel
Assignment|Safety/Efficacy Study
Prevention|Randomized|Open Label|Active Control|Parallel
Assignment|Bio-equivalence Study
...

What I want to do is parse this text using the "|" into individual 
variables

So the first case would be
des1 des2 des3 des4 des5 des6
Treatment Randomized Double-Blind Placebo Control Parallel Assignment 
Safety/Efficacy Study

I can think of a brute force way where I save this variable and my 
id variable, change | to a comma, output as text, read the text into 
stata as a comma separated file, and then merge it back into my data. 
Sounds silly, but perhaps it is the easiest. Any other ideas?

========================================================================
========

The -insheet- suggestion is valid, I believe. Another possibility is
the -split- command:

clear
input str150 longvar
"Treatment|Randomized|Double-Blind|Placebo Control|Parallel
Assignment|Safety/Efficacy Study"
"Prevention|Randomized|Open Label|Active Control|Parallel
Assignment|Bio-equivalence Study"
"Prevention|Randomized|Double Blind (Subject, Caregiver, Investigator,
Outcomes Assessor)|Crossover Assignment"
"Randomized|Single Blind|Active Control|Parallel Assignment"
"Natural History|Cross-Sectional|Case Control|Prospective Study"
"Treatment|Randomized|Open Label|Active Control|Parallel
Assignment|Efficacy Study"
"Treatment|Randomized|Double-Blind|Placebo Control|Single Group
Assignment|Safety/Efficacy Study"
"Treatment|Randomized|Open Label|Placebo Control|Parallel
Assignment|Safety/Efficacy Study"
"Treatment|Randomized|Double-Blind|Active Control|Parallel
Assignment|Safety/Efficacy Study"
"Prevention|Randomized|Double-Blind|Placebo Control|Parallel
Assignment|Safety/Efficacy Study"
"Treatment|Randomized|Single Blind (Investigator)|Placebo
Control|Parallel Assignment"
"Treatment|Randomized|Open Label|Active Control|Parallel
Assignment|Efficacy Study"
end

split longvar , generate(des) parse("|") limit(6)

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index