Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Extracting data from mixed string


From   David Harless <[email protected]>
To   [email protected]
Subject   Re: st: Extracting data from mixed string
Date   Sat, 07 Feb 2004 10:32:10 -0500

Others may come up with a simpler method, but I had a similar problem and did the following:

First step is to outsheet the variable (along with the id variable, if applicable) and open the file in a text editor (e.g., Textpad) and get rid of all the quotation marks. Then insheet as a comma delimited file and use stack.

I pasted your example data into Stata and named it stk_temp.dta. Here are

. use stk_temp.dta, clear

. list in 1/2

+-------------------------------+
| stk |
|-------------------------------|
1. | NONE,OVYS |
2. | NONE,NONE,NONE,MATES6,NATURAL |
+-------------------------------+

. outsheet using temp_stk1.txt, replace

. /*use Textpad or another text editor to get rid of all the quotation marks
> Save as temp_stk1_mod.txt */
. insheet using temp_stk1_mod.txt, clear comma
(5 vars, 19 obs)

. list in 1/2

+----------------------------+
| v1 v2 v3 v4 v5 |
|----------------------------|
1. | stk |
2. | NONE OVYS |
+----------------------------+

. drop if _n==1
(1 observation deleted)

. gen id=_n

. stack id v1 id v2 id v3 id v4 id v5, into(id method) clear

. drop if method=="" | method=="NONE" | method=="NATURAL"
(75 observations deleted)

. sort id _stack

. list in 1/10

+-----------------------+
| _stack id method |
|-----------------------|
1. | 2 1 OVYS |
2. | 4 2 MATES6 |
3. | 2 3 MICROG |
4. | 2 5 FEMODE |
5. | 1 7 LOGED |
|-----------------------|
6. | 3 8 MICROG |
7. | 4 8 MATES6 |
8. | 2 10 LEVONEL |
9. | 3 10 MICROG |
10. | 1 11 MICROG |
+-----------------------+
********************************
There is no blank between lines from the original string variable, but you have and id variable to distinguish.

I hope this is helpful

Dave Harless



Paul O'Brien wrote:

Database of 41,000 clinic visits with 26 variables for each visit. One variable is a string (stk) of contraceptives prescribed, containing 1 to 4 types, separated by commas.

stk
NONE,OVYS
NONE,NONE,NONE,MATES6,NATURAL
NONE,MICROG
NONE,NONE
NONE,FEMODE,NATURAL
NONE,NONE
LOGED
NONE,NONE,MICROG,MATES6,NATURAL
NONE
NONE,LEVONEL,MICROG
MICROG
NONE,NONE,LOG
NONE,NONE,NONE
NONE,MATES6,NATURAL
NONE,NONE
MICROG,ULTRA
NATURAL
NONE,MICROG,NATURAL

Some are oral contraceptives:OVYS MICROG FEMODE, LOGED

The position of the oral contraceptives in the string varies.

I want all the oral contraceptives in one column:

OVYS
blank
MATES6
MICROG
blank
FEMO
blank
LOGED

How?

Help gratefull received.

Paul

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index