Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Turning text pages into indicators

From	Jen Zhen <[email protected]>
To	[email protected]
Subject	st: Turning text pages into indicators
Date	Wed, 8 Aug 2012 15:01:18 +0200

Dear Statalisters,

(1) I'd like to create a list of indicators to cover whether a string
variable contains at least one out of several words.
I know I can check whether it contains one specific word with - gen
indicator=regexm(string,"word1") - but can I also cover several words
in one command line with this?
I tried - gen indicator=regexm(string,"word1" "word2") - and  gen
indicator=regexm(string,"word1" | "word2") - and these wouldn't work,
but maybe there's another way to do this?
I know I can as well generate a separate indicator for each word and
then just sum them up, but since I have many words and many strings to
cover that would be inefficient.

(2) I'm starting with long texts, think half a page or a full page, so
I presumably can't read the entire page into a single string variable
on which I can then perform (1) above.
Do I need to initially split the text in say Excel, or is there a way
to still read all text in in Stata and then split it into as many
variables as necessary (but no more)?

Thanks so much and best regards,
JZ
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Turning text pages into indicators
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: xtreg check for outliers
Next by Date: Re: st: how to have commands in a loop (over strings) shown in log files, without using nested quotes
Previous by thread: st: SAN'12 Stata Conference proceedings
Next by thread: Re: st: Turning text pages into indicators
Index(es):
- Date
- Thread